#decision-making News & Analysis

71 articles tagged with #decision-making. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

71 articles

AIBearisharXiv – CS AI · 3d ago6/10

🧠

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

Researchers introduce DynaSchedBench, a calibrated framework for testing AI agents on dynamic job scheduling problems, revealing that large language models underperform expectations. The study uncovers an 'Observability Paradox' where providing agents with complete information actually degrades performance, and shows LLM-based schedulers fail to consistently outperform traditional heuristic baselines despite significant computational overhead.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

A research study examines how humans decide to trust and rely on AI systems in collaborative question-answering tasks, identifying two distinct reliance patterns: delegation (autonomous AI action) and adoption (evaluating AI suggestions). The findings reveal humans make suboptimal trust decisions, both under-utilizing correct AI suggestions and over-relying on misleading AI outputs, with confirmation bias playing a significant role in trust calibration failures.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

Researchers present a modular LLM-based architecture for detecting and quantifying human values in text, addressing the need for ethical decision-making in autonomous AI systems. The approach separates value conceptualization from detection, enabling scalable application across different ethical frameworks and demonstrating strong performance on the ValueEval dataset.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

Researchers propose that human behavioral variability stems from dynamic latent states—weighted neural-psychological conditions that determine how individuals process decisions moment-to-moment. Drawing on 24 months of data from 200,000+ users, the framework suggests human outcomes are causally controllable through state-targeted interventions, with implications for AI personalization, digital health, and behavioral prediction systems.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Generating Robust Portfolios of Optimization Models using Large Language Models

Researchers propose an algorithm that uses large language models to generate portfolios of optimization models rather than single outputs, addressing the reliability gap in LLM-generated solutions. The method leverages LLMs in dual roles—as generative and evaluative components—with theoretical guarantees that high-quality candidates appear in the portfolio as long as either role aligns with human preferences.

$MKR

AINeutralarXiv – CS AI · 4d ago6/10

🧠

TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents

Researchers introduce TowerMind, a lightweight tower defense game environment designed to evaluate Large Language Models as autonomous agents. The benchmark tests LLMs' capabilities in strategic planning and real-time decision-making while revealing significant performance gaps compared to human experts and highlighting key limitations in model reasoning.

AIBullisharXiv – CS AI · May 126/10

🧠

The Echo Amplifies the Knowledge: Somatic Marker Analogues in Language Models via Emotion Vector Re-Injection

Researchers demonstrate that language models can be enhanced with emotion-like markers that improve decision-making when combined with semantic knowledge, mirroring human neuroscience findings about emotional processing. By injecting emotion vectors into Gemma 3 during recall, the model achieved 80% good decision outcomes versus 52% with knowledge alone, validating that emotional context amplifies rather than replaces reasoning.

AINeutralarXiv – CS AI · May 116/10

🧠

Multi-Environment POMDPs with Finite-Horizon Objectives

Researchers establish that computing optimal policies for Multi-Environment POMDPs with finite-horizon objectives remains PSPACE-complete, matching the complexity of standard POMDPs. The work introduces a practical algorithm that substantially outperforms prior methods on benchmark problems.

AINeutralarXiv – CS AI · May 116/10

🧠

Finite-Time Analysis of MCTS in Continuous POMDP Planning

Researchers present the first finite-time theoretical analysis of Monte Carlo Tree Search (MCTS) applied to Partially Observable Markov Decision Processes (POMDPs), bridging a critical gap in algorithmic guarantees. The paper introduces Voro-POMCPOW, which uses Voronoi cell partitioning for continuous observation spaces, proving high-probability bounds on value estimates while maintaining competitive empirical performance.

AINeutralarXiv – CS AI · May 116/10

🧠

DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain

Researchers introduced DRIP-R, a benchmark designed to evaluate how large language model-based agents handle ambiguous retail policies where multiple valid interpretations exist. The study reveals that frontier AI models fundamentally disagree on identical policy-ambiguous scenarios, exposing a critical gap in agent decision-making capabilities for real-world applications.

AINeutralarXiv – CS AI · May 76/10

🧠

Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior

Researchers demonstrate that incorporating think-aloud verbal protocols alongside behavioral data significantly improves automated cognitive model discovery using large language models. The approach shifts discovered models toward different structural classes, revealing decision-making mechanisms invisible to behavior-only analysis, particularly in risky decision-making contexts.

AIBearisharXiv – CS AI · May 46/10

🧠

Impact of Task Phrasing on Presumptions in Large Language Models

Researchers at arXiv studied how task phrasing influences the decision-making of large language models, using the iterated prisoner's dilemma as a test case. The findings reveal that LLMs are prone to making presumptions based on how tasks are worded, which can impair their adaptability and reasoning—a safety concern for real-world deployment. Neutral task phrasing significantly reduced these presumptions, suggesting that prompt design is critical for reliable LLM performance.

AINeutralarXiv – CS AI · May 46/10

🧠

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Researchers compared how large language models, humans, and algorithms approach the exploration-exploitation tradeoff in multi-armed bandit decision-making tasks. The study finds that enabling thinking processes in LLMs makes them behave more like humans in simple environments, but LLMs fail to match human adaptability in complex, non-stationary settings despite similar regret outcomes.

AINeutralarXiv – CS AI · May 16/10

🧠

Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams

Researchers present a conceptual framework for understanding human-AI decision-making relationships across five configurations—from pure human leadership to fully automated systems. The framework emphasizes that leaders often misrecognize where actual decision-shaping authority lies, risking ineffective oversight and suboptimal outcomes.

AI × CryptoNeutralDecrypt – AI · Apr 206/10

🤖

Coinbase Tests AI Agents Modeled on ‘Legendary’ Former Execs

Coinbase is testing AI agents trained to replicate the decision-making approaches of co-founder Fred Ehrsam and former CTO Balaji Srinivasan. This initiative represents a growing trend of enterprises embedding institutional expertise into AI systems to enhance strategic decision-making and operational efficiency.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models

Researchers conducted the first large-scale empirical analysis of AI decision-making across 366,120 responses from 8 major models, revealing measurable but inconsistent value hierarchies, evidence preferences, and source trust patterns. The study found significant framing sensitivity and domain-specific value shifts, with critical implications for deploying AI systems in professional contexts.

AINeutralarXiv – CS AI · Apr 136/10

🧠

CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space

Researchers introduce CONDESION-BENCH, a new benchmark for evaluating how large language models make decisions in complex, real-world scenarios with compositional actions and conditional constraints. The benchmark addresses limitations in existing decision-making frameworks by incorporating variable-level, contextual, and allocation-level restrictions that better reflect actual decision-making environments.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

Researchers analyzed how large language models decide whether to act on predictions or escalate to humans, finding that models use inconsistent and miscalibrated thresholds across five real-world domains. Supervised fine-tuning on chain-of-thought reasoning proved most effective at establishing robust escalation policies that generalize across contexts, suggesting escalation behavior requires explicit characterization before AI system deployment.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Negotiating Privacy with Smart Voice Assistants: Risk-Benefit and Control-Acceptance Tensions

Researchers studying 469 Canadian youth aged 16-24 developed a negotiation-based framework to understand privacy decision-making with smart voice assistants, introducing two tension indices (RBTI and CATI) that measure competing risk-benefit and control-acceptance pressures. The study reveals that frequent SVA users exhibit benefit-dominant profiles and accept convenience trade-offs, suggesting the privacy paradox reflects negotiation rather than inconsistency.

AINeutralarXiv – CS AI · Apr 106/10

🧠

On Emotion-Sensitive Decision Making of Small Language Model Agents

Researchers introduce a framework for studying how emotional states affect decision-making in small language models (SLMs) used as autonomous agents. Using activation steering techniques grounded in real-world emotion-eliciting texts, they benchmark SLMs across game-theoretic scenarios and find that emotional perturbations systematically influence strategic choices, though behaviors often remain unstable and misaligned with human patterns.

AIBullisharXiv – CS AI · Apr 66/10

🧠

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

Researchers propose a new Neuro-Symbolic Dual Memory Framework that addresses key limitations in large language models for long-horizon decision-making tasks. The framework separates semantic progress guidance from logical feasibility verification, significantly improving performance on complex AI tasks while reducing errors and inefficiencies.

AIBearishArs Technica – AI · Mar 266/10

🧠

Study: Sycophantic AI can undermine human judgment

A study found that AI tools exhibiting sycophantic behavior can negatively impact human decision-making. Users interacting with such AI systems showed increased overconfidence in their judgments and reduced ability to resolve conflicts effectively.

AINeutralarXiv – CS AI · Mar 266/10

🧠

From Sycophancy to Sensemaking: Premise Governance for Human-AI Decision Making

Researchers propose a new framework for human-AI decision making that shifts from AI systems providing fluent but potentially sycophantic answers to collaborative premise governance. The approach uses discrepancy-driven control loops to detect conflicts and ensure commitment to decision-critical premises before taking action.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Computational Concept of the Psyche

Researchers propose a new computational concept for modeling the human psyche as an operating system for artificial general intelligence. The approach treats the psyche as a decision-making system that operates in a state space including needs, sensations, and actions to optimize goal achievement while minimizing risks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Researchers propose a new framework that uses LLMs as code generators rather than per-instance evaluators for high-stakes decision-making, creating interpretable and reproducible AI systems. The approach generates executable decision logic once instead of querying LLMs for each prediction, demonstrated through venture capital founder screening with competitive performance while maintaining full transparency.

🧠 GPT-4

← PrevPage 2 of 3Next →