AIBearisharXiv – CS AI · 3d ago7/10
🧠A new academic framework argues that AI systems create an 'illusion of opting'—where users appear to have meaningful choice while their actual decision-making agency is systematically weakened. The research proposes three normative imperatives (existential honesty, ecological rationality, and counterfactual reparation) to protect human agency in AI-mediated consequential decisions, particularly for vulnerable populations.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers propose DeMem, a decision-centric memory framework that optimizes agent memory allocation based on preserving distinctions needed for sound decision-making rather than descriptive accuracy. Using rate-distortion theory, the approach identifies what information can be safely forgotten under memory constraints and demonstrates performance gains on long-horizon language agent tasks.
AIBearisharXiv – CS AI · May 127/10
🧠Researchers have identified systematic fairness disparities in how large language models explain their decisions across demographic groups, introducing the Explanation Fairness Taxonomy (EFT) to measure five dimensions of explanation inequality. Testing five major LLMs across hiring, medical, credit, and legal domains reveals statistically significant disparities in explanation quality, with stylistic inequalities appearing resistant to prompt-based fixes and likely embedded in model pre-training.
🧠 GPT-4🧠 Claude
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce StraTA, a novel reinforcement learning framework that improves LLM agent performance on long-horizon tasks by incorporating explicit trajectory-level strategies alongside action execution. The approach achieves state-of-the-art results on benchmark environments, reaching 93.1% on ALFWorld and 84.2% on WebShop, outperforming existing methods and some closed-source models.
AIBullisharXiv – CS AI · May 47/10
🧠Researchers present a decision-making framework to optimize when large language models should call external tools like web search. The study reveals that models often misjudge their actual need for tool use, and proposes lightweight estimators trained on hidden states to improve tool-calling decisions, demonstrating performance gains across multiple tasks.
AIBullisharXiv – CS AI · Apr 157/10
🧠Researchers introduce IDEA, a framework that converts Large Language Model decision-making into interpretable, editable parametric models with calibrated probabilities. The approach outperforms major LLMs like GPT-5.2 and DeepSeek R1 on benchmarks while enabling direct expert knowledge integration and precise human-AI collaboration.
🧠 GPT-5
AIBearisharXiv – CS AI · Apr 157/10
🧠Researchers tested whether large language models exhibit the Identifiable Victim Effect (IVE)—a well-documented cognitive bias where people prioritize helping a specific individual over a larger group facing equal hardship. Across 51,955 API trials spanning 16 frontier models, instruction-tuned LLMs showed amplified IVE compared to humans, while reasoning-specialized models inverted the effect, raising critical concerns about AI deployment in humanitarian decision-making.
🏢 OpenAI🏢 Anthropic🏢 xAI
AIBearisharXiv – CS AI · Apr 157/10
🧠Researchers conducted the first systematic study of order bias in Large Language Models used for high-stakes decision-making, finding that LLMs exhibit strong position effects and previously undocumented name biases that can lead to selection of strictly inferior options. The study reveals distinct failure modes in AI decision-support systems, with proposed mitigation strategies using temperature parameter adjustments to recover underlying preferences.
AIBearisharXiv – CS AI · Apr 147/10
🧠Researchers have identified 'LLM Nepotism,' a bias where language models favor job candidates and organizational decisions that express trust in AI, regardless of merit. This creates self-reinforcing cycles where AI-trusting organizations make worse decisions and delegate more to AI systems, potentially compromising governance quality across sectors.
AIBullisharXiv – CS AI · Apr 147/10
🧠A comprehensive tutorial examines how deep learning complements operations research and optimization for sequential decision-making under uncertainty. The framework positions AI not as a replacement for traditional optimization but as an enhancement, with applications across supply chains, healthcare, energy, and autonomous systems.
AINeutralarXiv – CS AI · Mar 277/10
🧠Researchers introduce Quantized Simplex Gossip (QSG) model to explain how multi-agent LLM systems reach consensus through 'memetic drift' - where arbitrary choices compound into collective agreement. The study reveals scaling laws for when collective intelligence operates like a lottery versus amplifying weak biases, providing a framework for understanding AI system behavior in consequential decision-making.
AINeutralCrypto Briefing · Mar 267/10
🧠Christian Catalini discusses how AI's rapid advancement will significantly transform job markets, with entry-level coding positions facing the most disruption. Despite automation trends, human expertise will remain essential for critical decision-making processes.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose Resource-Rational Contractualism (RRC), a new framework for AI alignment that enables AI systems to make decisions affecting diverse stakeholders through efficient approximations of rational agreements. The approach uses normatively-grounded heuristics to balance computational effort with accuracy in navigating complex human social environments.
AINeutralarXiv – CS AI · Mar 127/10
🧠Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.
🧠 ChatGPT🧠 Claude🧠 Sonnet
AINeutralMIT Technology Review · Mar 107/10
🧠This article discusses AI's role in the Iran conflict, specifically how AI models like Claude are being used by the US military for decision-making purposes. The piece appears to be part of a technology newsletter covering AI applications in geopolitical contexts.
🧠 Claude
AIBullisharXiv – CS AI · Mar 97/10
🧠Researchers developed new Monte Carlo inference strategies inspired by Bayesian Experimental Design to improve AI agents' information-seeking capabilities. The methods significantly enhanced language models' performance in strategic decision-making tasks, with weaker models like Llama-4-Scout outperforming GPT-5 at 1% of the cost.
🧠 GPT-5🧠 Llama
AIBearisharXiv – CS AI · Mar 56/10
🧠Research comparing four state-of-the-art language models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur) to humans in goal selection tasks reveals substantial divergence in behavior. While humans explore diverse approaches and learn gradually, the AI models tend to exploit single solutions or show poor performance, raising concerns about using current LLMs as proxies for human decision-making in critical applications.
🧠 Claude🧠 Gemini
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers developed GLEAN, a new AI verification framework that improves reliability of LLM-powered agents in high-stakes decisions like clinical diagnosis. The system uses expert guidelines and Bayesian logistic regression to better verify AI agent decisions, showing 12% improvement in accuracy and 50% better calibration in medical diagnosis tests.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers developed COOL-MC, a tool that combines reinforcement learning with model checking to verify and explain AI policies for platelet inventory management in blood banks. The system achieved a 2.9% stockout probability while providing transparent decision-making explanations for safety-critical healthcare applications.
AIBullisharXiv – CS AI · Mar 46/102
🧠Researchers propose NAR-CP, a new method to improve Large Language Models' performance in high-frequency decision-making tasks like UAV pursuit. The approach uses normalized action rewards and consistency policy optimization to address limitations in current LLM-based agents that struggle with rapid, precise numerical state updates.
AIBearishFortune Crypto · Mar 37/103
🧠AI technology is accelerating battlefield decision-making processes, potentially enabling military actions to occur faster than human comprehension. This advancement raises significant concerns about risk management and ethical implications in warfare.
AIBullisharXiv – CS AI · Mar 37/103
🧠Researchers have developed Value Flows, a new reinforcement learning method that uses flow-based models to estimate complete return distributions rather than single scalar values. The approach achieves 1.3x improvement in success rates across 62 benchmark tasks by better identifying states with high return uncertainty for improved decision-making.
AINeutralarXiv – CS AI · Feb 277/107
🧠A qualitative study with 26 non-AI expert stakeholders reveals that everyday users assess AI fairness more comprehensively than AI experts, considering broader features beyond legally protected categories and setting stricter fairness thresholds. The research highlights the importance of incorporating stakeholder perspectives in AI governance and fairness assessment processes.
AINeutralarXiv – CS AI · Feb 277/106
🧠Researchers developed a new theoretical framework for accelerated risk-averse policy evaluation in partially observable Markov decision processes (POMDPs) using Conditional Value-at-Risk (CVaR) bounds. The method enables safe elimination of suboptimal actions while maintaining computational guarantees, achieving substantial speedups in autonomous agent decision-making under uncertainty.
AINeutralarXiv – CS AI · Feb 277/106
🧠Researchers propose a new framework for collective decision-making where AI agents can abstain from voting when uncertain, extending the Condorcet Jury Theorem to confidence-gated settings. The study shows this selective participation approach can improve group accuracy and potentially reduce hallucinations in large language model systems.