49 articles tagged with #decision-making. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv – CS AI · 1d ago7/10
🧠Researchers conducted the first systematic study of order bias in Large Language Models used for high-stakes decision-making, finding that LLMs exhibit strong position effects and previously undocumented name biases that can lead to selection of strictly inferior options. The study reveals distinct failure modes in AI decision-support systems, with proposed mitigation strategies using temperature parameter adjustments to recover underlying preferences.
AIBullisharXiv – CS AI · 1d ago7/10
🧠Researchers introduce IDEA, a framework that converts Large Language Model decision-making into interpretable, editable parametric models with calibrated probabilities. The approach outperforms major LLMs like GPT-5.2 and DeepSeek R1 on benchmarks while enabling direct expert knowledge integration and precise human-AI collaboration.
🧠 GPT-5
AIBearisharXiv – CS AI · 1d ago7/10
🧠Researchers tested whether large language models exhibit the Identifiable Victim Effect (IVE)—a well-documented cognitive bias where people prioritize helping a specific individual over a larger group facing equal hardship. Across 51,955 API trials spanning 16 frontier models, instruction-tuned LLMs showed amplified IVE compared to humans, while reasoning-specialized models inverted the effect, raising critical concerns about AI deployment in humanitarian decision-making.
🏢 OpenAI🏢 Anthropic🏢 xAI
AIBullisharXiv – CS AI · 2d ago7/10
🧠A comprehensive tutorial examines how deep learning complements operations research and optimization for sequential decision-making under uncertainty. The framework positions AI not as a replacement for traditional optimization but as an enhancement, with applications across supply chains, healthcare, energy, and autonomous systems.
AIBearisharXiv – CS AI · 2d ago7/10
🧠Researchers have identified 'LLM Nepotism,' a bias where language models favor job candidates and organizational decisions that express trust in AI, regardless of merit. This creates self-reinforcing cycles where AI-trusting organizations make worse decisions and delegate more to AI systems, potentially compromising governance quality across sectors.
AINeutralarXiv – CS AI · Mar 277/10
🧠Researchers introduce Quantized Simplex Gossip (QSG) model to explain how multi-agent LLM systems reach consensus through 'memetic drift' - where arbitrary choices compound into collective agreement. The study reveals scaling laws for when collective intelligence operates like a lottery versus amplifying weak biases, providing a framework for understanding AI system behavior in consequential decision-making.
AINeutralCrypto Briefing · Mar 267/10
🧠Christian Catalini discusses how AI's rapid advancement will significantly transform job markets, with entry-level coding positions facing the most disruption. Despite automation trends, human expertise will remain essential for critical decision-making processes.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose Resource-Rational Contractualism (RRC), a new framework for AI alignment that enables AI systems to make decisions affecting diverse stakeholders through efficient approximations of rational agreements. The approach uses normatively-grounded heuristics to balance computational effort with accuracy in navigating complex human social environments.
AINeutralarXiv – CS AI · Mar 127/10
🧠Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.
🧠 ChatGPT🧠 Claude🧠 Sonnet
AINeutralMIT Technology Review · Mar 107/10
🧠This article discusses AI's role in the Iran conflict, specifically how AI models like Claude are being used by the US military for decision-making purposes. The piece appears to be part of a technology newsletter covering AI applications in geopolitical contexts.
🧠 Claude
AIBullisharXiv – CS AI · Mar 97/10
🧠Researchers developed new Monte Carlo inference strategies inspired by Bayesian Experimental Design to improve AI agents' information-seeking capabilities. The methods significantly enhanced language models' performance in strategic decision-making tasks, with weaker models like Llama-4-Scout outperforming GPT-5 at 1% of the cost.
🧠 GPT-5🧠 Llama
AIBearisharXiv – CS AI · Mar 56/10
🧠Research comparing four state-of-the-art language models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur) to humans in goal selection tasks reveals substantial divergence in behavior. While humans explore diverse approaches and learn gradually, the AI models tend to exploit single solutions or show poor performance, raising concerns about using current LLMs as proxies for human decision-making in critical applications.
🧠 Claude🧠 Gemini
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers developed COOL-MC, a tool that combines reinforcement learning with model checking to verify and explain AI policies for platelet inventory management in blood banks. The system achieved a 2.9% stockout probability while providing transparent decision-making explanations for safety-critical healthcare applications.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers developed GLEAN, a new AI verification framework that improves reliability of LLM-powered agents in high-stakes decisions like clinical diagnosis. The system uses expert guidelines and Bayesian logistic regression to better verify AI agent decisions, showing 12% improvement in accuracy and 50% better calibration in medical diagnosis tests.
AIBullisharXiv – CS AI · Mar 46/102
🧠Researchers propose NAR-CP, a new method to improve Large Language Models' performance in high-frequency decision-making tasks like UAV pursuit. The approach uses normalized action rewards and consistency policy optimization to address limitations in current LLM-based agents that struggle with rapid, precise numerical state updates.
AIBearishFortune Crypto · Mar 37/103
🧠AI technology is accelerating battlefield decision-making processes, potentially enabling military actions to occur faster than human comprehension. This advancement raises significant concerns about risk management and ethical implications in warfare.
AIBullisharXiv – CS AI · Mar 37/103
🧠Researchers have developed Value Flows, a new reinforcement learning method that uses flow-based models to estimate complete return distributions rather than single scalar values. The approach achieves 1.3x improvement in success rates across 62 benchmark tasks by better identifying states with high return uncertainty for improved decision-making.
AINeutralarXiv – CS AI · Feb 277/106
🧠Researchers propose a new framework for collective decision-making where AI agents can abstain from voting when uncertain, extending the Condorcet Jury Theorem to confidence-gated settings. The study shows this selective participation approach can improve group accuracy and potentially reduce hallucinations in large language model systems.
AINeutralarXiv – CS AI · Feb 277/106
🧠Researchers developed a new theoretical framework for accelerated risk-averse policy evaluation in partially observable Markov decision processes (POMDPs) using Conditional Value-at-Risk (CVaR) bounds. The method enables safe elimination of suboptimal actions while maintaining computational guarantees, achieving substantial speedups in autonomous agent decision-making under uncertainty.
AINeutralarXiv – CS AI · Feb 277/107
🧠A qualitative study with 26 non-AI expert stakeholders reveals that everyday users assess AI fairness more comprehensively than AI experts, considering broader features beyond legally protected categories and setting stricter fairness thresholds. The research highlights the importance of incorporating stakeholder perspectives in AI governance and fairness assessment processes.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers conducted the first large-scale empirical analysis of AI decision-making across 366,120 responses from 8 major models, revealing measurable but inconsistent value hierarchies, evidence preferences, and source trust patterns. The study found significant framing sensitivity and domain-specific value shifts, with critical implications for deploying AI systems in professional contexts.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce CONDESION-BENCH, a new benchmark for evaluating how large language models make decisions in complex, real-world scenarios with compositional actions and conditional constraints. The benchmark addresses limitations in existing decision-making frameworks by incorporating variable-level, contextual, and allocation-level restrictions that better reflect actual decision-making environments.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers analyzed how large language models decide whether to act on predictions or escalate to humans, finding that models use inconsistent and miscalibrated thresholds across five real-world domains. Supervised fine-tuning on chain-of-thought reasoning proved most effective at establishing robust escalation policies that generalize across contexts, suggesting escalation behavior requires explicit characterization before AI system deployment.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers introduce a framework for studying how emotional states affect decision-making in small language models (SLMs) used as autonomous agents. Using activation steering techniques grounded in real-world emotion-eliciting texts, they benchmark SLMs across game-theoretic scenarios and find that emotional perturbations systematically influence strategic choices, though behaviors often remain unstable and misaligned with human patterns.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers studying 469 Canadian youth aged 16-24 developed a negotiation-based framework to understand privacy decision-making with smart voice assistants, introducing two tension indices (RBTI and CATI) that measure competing risk-benefit and control-acceptance pressures. The study reveals that frequent SVA users exhibit benefit-dominant profiles and accept convenience trade-offs, suggesting the privacy paradox reflects negotiation rather than inconsistency.