AIBearisharXiv – CS AI · 3d ago6/10
🧠Researchers introduce DynaSchedBench, a calibrated framework for testing AI agents on dynamic job scheduling problems, revealing that large language models underperform expectations. The study uncovers an 'Observability Paradox' where providing agents with complete information actually degrades performance, and shows LLM-based schedulers fail to consistently outperform traditional heuristic baselines despite significant computational overhead.
AINeutralarXiv – CS AI · 3d ago6/10
🧠A research study examines how humans decide to trust and rely on AI systems in collaborative question-answering tasks, identifying two distinct reliance patterns: delegation (autonomous AI action) and adoption (evaluating AI suggestions). The findings reveal humans make suboptimal trust decisions, both under-utilizing correct AI suggestions and over-relying on misleading AI outputs, with confirmation bias playing a significant role in trust calibration failures.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers present a modular LLM-based architecture for detecting and quantifying human values in text, addressing the need for ethical decision-making in autonomous AI systems. The approach separates value conceptualization from detection, enabling scalable application across different ethical frameworks and demonstrating strong performance on the ValueEval dataset.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose that human behavioral variability stems from dynamic latent states—weighted neural-psychological conditions that determine how individuals process decisions moment-to-moment. Drawing on 24 months of data from 200,000+ users, the framework suggests human outcomes are causally controllable through state-targeted interventions, with implications for AI personalization, digital health, and behavioral prediction systems.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose an algorithm that uses large language models to generate portfolios of optimization models rather than single outputs, addressing the reliability gap in LLM-generated solutions. The method leverages LLMs in dual roles—as generative and evaluative components—with theoretical guarantees that high-quality candidates appear in the portfolio as long as either role aligns with human preferences.
$MKR
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce TowerMind, a lightweight tower defense game environment designed to evaluate Large Language Models as autonomous agents. The benchmark tests LLMs' capabilities in strategic planning and real-time decision-making while revealing significant performance gaps compared to human experts and highlighting key limitations in model reasoning.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers demonstrate that language models can be enhanced with emotion-like markers that improve decision-making when combined with semantic knowledge, mirroring human neuroscience findings about emotional processing. By injecting emotion vectors into Gemma 3 during recall, the model achieved 80% good decision outcomes versus 52% with knowledge alone, validating that emotional context amplifies rather than replaces reasoning.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers establish that computing optimal policies for Multi-Environment POMDPs with finite-horizon objectives remains PSPACE-complete, matching the complexity of standard POMDPs. The work introduces a practical algorithm that substantially outperforms prior methods on benchmark problems.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present the first finite-time theoretical analysis of Monte Carlo Tree Search (MCTS) applied to Partially Observable Markov Decision Processes (POMDPs), bridging a critical gap in algorithmic guarantees. The paper introduces Voro-POMCPOW, which uses Voronoi cell partitioning for continuous observation spaces, proving high-probability bounds on value estimates while maintaining competitive empirical performance.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduced DRIP-R, a benchmark designed to evaluate how large language model-based agents handle ambiguous retail policies where multiple valid interpretations exist. The study reveals that frontier AI models fundamentally disagree on identical policy-ambiguous scenarios, exposing a critical gap in agent decision-making capabilities for real-world applications.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers demonstrate that incorporating think-aloud verbal protocols alongside behavioral data significantly improves automated cognitive model discovery using large language models. The approach shifts discovered models toward different structural classes, revealing decision-making mechanisms invisible to behavior-only analysis, particularly in risky decision-making contexts.
AIBearisharXiv – CS AI · May 46/10
🧠Researchers at arXiv studied how task phrasing influences the decision-making of large language models, using the iterated prisoner's dilemma as a test case. The findings reveal that LLMs are prone to making presumptions based on how tasks are worded, which can impair their adaptability and reasoning—a safety concern for real-world deployment. Neutral task phrasing significantly reduced these presumptions, suggesting that prompt design is critical for reliable LLM performance.
AINeutralarXiv – CS AI · May 46/10
🧠Researchers compared how large language models, humans, and algorithms approach the exploration-exploitation tradeoff in multi-armed bandit decision-making tasks. The study finds that enabling thinking processes in LLMs makes them behave more like humans in simple environments, but LLMs fail to match human adaptability in complex, non-stationary settings despite similar regret outcomes.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers present a conceptual framework for understanding human-AI decision-making relationships across five configurations—from pure human leadership to fully automated systems. The framework emphasizes that leaders often misrecognize where actual decision-shaping authority lies, risking ineffective oversight and suboptimal outcomes.
AI × CryptoNeutralDecrypt – AI · Apr 206/10
🤖Coinbase is testing AI agents trained to replicate the decision-making approaches of co-founder Fred Ehrsam and former CTO Balaji Srinivasan. This initiative represents a growing trend of enterprises embedding institutional expertise into AI systems to enhance strategic decision-making and operational efficiency.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers conducted the first large-scale empirical analysis of AI decision-making across 366,120 responses from 8 major models, revealing measurable but inconsistent value hierarchies, evidence preferences, and source trust patterns. The study found significant framing sensitivity and domain-specific value shifts, with critical implications for deploying AI systems in professional contexts.
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers introduce CONDESION-BENCH, a new benchmark for evaluating how large language models make decisions in complex, real-world scenarios with compositional actions and conditional constraints. The benchmark addresses limitations in existing decision-making frameworks by incorporating variable-level, contextual, and allocation-level restrictions that better reflect actual decision-making environments.
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers analyzed how large language models decide whether to act on predictions or escalate to humans, finding that models use inconsistent and miscalibrated thresholds across five real-world domains. Supervised fine-tuning on chain-of-thought reasoning proved most effective at establishing robust escalation policies that generalize across contexts, suggesting escalation behavior requires explicit characterization before AI system deployment.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers studying 469 Canadian youth aged 16-24 developed a negotiation-based framework to understand privacy decision-making with smart voice assistants, introducing two tension indices (RBTI and CATI) that measure competing risk-benefit and control-acceptance pressures. The study reveals that frequent SVA users exhibit benefit-dominant profiles and accept convenience trade-offs, suggesting the privacy paradox reflects negotiation rather than inconsistency.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce a framework for studying how emotional states affect decision-making in small language models (SLMs) used as autonomous agents. Using activation steering techniques grounded in real-world emotion-eliciting texts, they benchmark SLMs across game-theoretic scenarios and find that emotional perturbations systematically influence strategic choices, though behaviors often remain unstable and misaligned with human patterns.
AIBullisharXiv – CS AI · Apr 66/10
🧠Researchers propose a new Neuro-Symbolic Dual Memory Framework that addresses key limitations in large language models for long-horizon decision-making tasks. The framework separates semantic progress guidance from logical feasibility verification, significantly improving performance on complex AI tasks while reducing errors and inefficiencies.
AIBearishArs Technica – AI · Mar 266/10
🧠A study found that AI tools exhibiting sycophantic behavior can negatively impact human decision-making. Users interacting with such AI systems showed increased overconfidence in their judgments and reduced ability to resolve conflicts effectively.
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers propose a new framework for human-AI decision making that shifts from AI systems providing fluent but potentially sycophantic answers to collaborative premise governance. The approach uses discrepancy-driven control loops to detect conflicts and ensure commitment to decision-critical premises before taking action.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose a new computational concept for modeling the human psyche as an operating system for artificial general intelligence. The approach treats the psyche as a decision-making system that operates in a state space including needs, sensations, and actions to optimize goal achievement while minimizing risks.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose a new framework that uses LLMs as code generators rather than per-instance evaluators for high-stakes decision-making, creating interpretable and reproducible AI systems. The approach generates executable decision logic once instead of querying LLMs for each prediction, demonstrated through venture capital founder screening with competitive performance while maintaining full transparency.
🧠 GPT-4