#decision-making News & Analysis

107 articles tagged with #decision-making. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

107 articles

AINeutralarXiv – CS AI · Jun 26/10

🧠

Multi-Objective Reinforcement Learning for Tactical Decision Making for Trucks in Highway Traffic

Researchers present a multi-objective reinforcement learning framework using Proximal Policy Optimization to optimize tactical decision-making for autonomous trucks on highways. The system learns Pareto-optimal policies that balance competing objectives—safety, energy efficiency, and time efficiency—without requiring retraining when switching between different driving behaviors.

AI × CryptoNeutralCrypto Briefing · Jun 16/10

🤖

Erik Brooks: Effective questioning enhances decision-making, understanding risk and return is vital in volatile markets, and AI is reshaping investment strategies | Capital Allocators

Erik Brooks discusses how AI is fundamentally transforming investment strategies in volatile markets, emphasizing that effective questioning and risk-return analysis are critical for decision-making. The insights highlight the intersection of traditional investment principles with emerging AI-driven approaches that challenge conventional portfolio management.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Comparing LLM-Based Conversational and Graphical Interfaces for Industrial Decision Tasks: An Exploratory Mixed-Methods Study

A mixed-methods study comparing LLM-based conversational interfaces with traditional dashboards for industrial decision-making found that conversational agents reduce interaction effort through natural language access, while dashboards remain superior for overview and verification tasks. The research suggests AI conversational interfaces show promise for industrial IoT data analysis but require larger-scale validation across different task types.

AIBullisharXiv – CS AI · Jun 16/10

🧠

Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach

Researchers introduce Iterative Regret-Minimization Fine-Tuning (Iterative RMFT), a post-training method that improves LLMs' decision-making capabilities by iteratively distilling low-regret trajectories back into models. The approach addresses fundamental limitations in how LLMs handle online decision problems without relying on rigid algorithmic templates, demonstrating improvements across multiple model architectures.

🧠 GPT-4

AINeutralarXiv – CS AI · May 296/10

🧠

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

Researchers propose a novel decision mechanism for predicting online conversation derailment that decouples the trigger decision from derailment likelihood estimation. By incorporating forward-looking simulations to identify potential recovery paths, the method significantly reduces false positive alerts while maintaining forecasting accuracy, advancing the field of conversational AI safety.

AINeutralarXiv – CS AI · May 296/10

🧠

Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

Researchers discovered that large language model failures in clinical triage stem from output formatting constraints rather than deficient medical knowledge. Using sparse autoencoders to analyze model internals, they found medical features activate identically across free-text and multiple-choice formats, but scaffold features drive incorrect decisions at the decision token, suggesting the models possess clinical understanding but struggle with constrained response structures.

AINeutralarXiv – CS AI · May 286/10

🧠

Identifying and Understanding Human Values in Text: A Tailorable LLM-based Architecture

Researchers present a modular LLM-based architecture for detecting and quantifying human values in text, addressing the need for ethical decision-making in autonomous AI systems. The approach separates value conceptualization from detection, enabling scalable application across different ethical frameworks and demonstrating strong performance on the ValueEval dataset.

AIBearisharXiv – CS AI · May 286/10

🧠

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

Researchers introduce DynaSchedBench, a calibrated framework for testing AI agents on dynamic job scheduling problems, revealing that large language models underperform expectations. The study uncovers an 'Observability Paradox' where providing agents with complete information actually degrades performance, and shows LLM-based schedulers fail to consistently outperform traditional heuristic baselines despite significant computational overhead.

AINeutralarXiv – CS AI · May 286/10

🧠

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

Researchers propose that human behavioral variability stems from dynamic latent states—weighted neural-psychological conditions that determine how individuals process decisions moment-to-moment. Drawing on 24 months of data from 200,000+ users, the framework suggests human outcomes are causally controllable through state-targeted interventions, with implications for AI personalization, digital health, and behavioral prediction systems.

AINeutralarXiv – CS AI · May 286/10

🧠

AI, Take the Wheel: What Drives Delegation and Trust in Human-Computer Cooperative Question Answering?

A research study examines how humans decide to trust and rely on AI systems in collaborative question-answering tasks, identifying two distinct reliance patterns: delegation (autonomous AI action) and adoption (evaluating AI suggestions). The findings reveal humans make suboptimal trust decisions, both under-utilizing correct AI suggestions and over-relying on misleading AI outputs, with confirmation bias playing a significant role in trust calibration failures.

AINeutralarXiv – CS AI · May 276/10

🧠

TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents

Researchers introduce TowerMind, a lightweight tower defense game environment designed to evaluate Large Language Models as autonomous agents. The benchmark tests LLMs' capabilities in strategic planning and real-time decision-making while revealing significant performance gaps compared to human experts and highlighting key limitations in model reasoning.

AINeutralarXiv – CS AI · May 276/10

🧠

Generating Robust Portfolios of Optimization Models using Large Language Models

Researchers propose an algorithm that uses large language models to generate portfolios of optimization models rather than single outputs, addressing the reliability gap in LLM-generated solutions. The method leverages LLMs in dual roles—as generative and evaluative components—with theoretical guarantees that high-quality candidates appear in the portfolio as long as either role aligns with human preferences.

$MKR

AIBullisharXiv – CS AI · May 126/10

🧠

The Echo Amplifies the Knowledge: Somatic Marker Analogues in Language Models via Emotion Vector Re-Injection

Researchers demonstrate that language models can be enhanced with emotion-like markers that improve decision-making when combined with semantic knowledge, mirroring human neuroscience findings about emotional processing. By injecting emotion vectors into Gemma 3 during recall, the model achieved 80% good decision outcomes versus 52% with knowledge alone, validating that emotional context amplifies rather than replaces reasoning.

AINeutralarXiv – CS AI · May 116/10

🧠

Multi-Environment POMDPs with Finite-Horizon Objectives

Researchers establish that computing optimal policies for Multi-Environment POMDPs with finite-horizon objectives remains PSPACE-complete, matching the complexity of standard POMDPs. The work introduces a practical algorithm that substantially outperforms prior methods on benchmark problems.

AINeutralarXiv – CS AI · May 116/10

🧠

Finite-Time Analysis of MCTS in Continuous POMDP Planning

Researchers present the first finite-time theoretical analysis of Monte Carlo Tree Search (MCTS) applied to Partially Observable Markov Decision Processes (POMDPs), bridging a critical gap in algorithmic guarantees. The paper introduces Voro-POMCPOW, which uses Voronoi cell partitioning for continuous observation spaces, proving high-probability bounds on value estimates while maintaining competitive empirical performance.

AINeutralarXiv – CS AI · May 116/10

🧠

DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain

Researchers introduced DRIP-R, a benchmark designed to evaluate how large language model-based agents handle ambiguous retail policies where multiple valid interpretations exist. The study reveals that frontier AI models fundamentally disagree on identical policy-ambiguous scenarios, exposing a critical gap in agent decision-making capabilities for real-world applications.

AINeutralarXiv – CS AI · May 76/10

🧠

Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior

Researchers demonstrate that incorporating think-aloud verbal protocols alongside behavioral data significantly improves automated cognitive model discovery using large language models. The approach shifts discovered models toward different structural classes, revealing decision-making mechanisms invisible to behavior-only analysis, particularly in risky decision-making contexts.

AIBearisharXiv – CS AI · May 46/10

🧠

Impact of Task Phrasing on Presumptions in Large Language Models

Researchers at arXiv studied how task phrasing influences the decision-making of large language models, using the iterated prisoner's dilemma as a test case. The findings reveal that LLMs are prone to making presumptions based on how tasks are worded, which can impair their adaptability and reasoning—a safety concern for real-world deployment. Neutral task phrasing significantly reduced these presumptions, suggesting that prompt design is critical for reliable LLM performance.

AINeutralarXiv – CS AI · May 46/10

🧠

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Researchers compared how large language models, humans, and algorithms approach the exploration-exploitation tradeoff in multi-armed bandit decision-making tasks. The study finds that enabling thinking processes in LLMs makes them behave more like humans in simple environments, but LLMs fail to match human adaptability in complex, non-stationary settings despite similar regret outcomes.

AINeutralarXiv – CS AI · May 16/10

🧠

Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams

Researchers present a conceptual framework for understanding human-AI decision-making relationships across five configurations—from pure human leadership to fully automated systems. The framework emphasizes that leaders often misrecognize where actual decision-shaping authority lies, risking ineffective oversight and suboptimal outcomes.

AI × CryptoNeutralDecrypt – AI · Apr 206/10

🤖

Coinbase Tests AI Agents Modeled on ‘Legendary’ Former Execs

Coinbase is testing AI agents trained to replicate the decision-making approaches of co-founder Fred Ehrsam and former CTO Balaji Srinivasan. This initiative represents a growing trend of enterprises embedding institutional expertise into AI systems to enhance strategic decision-making and operational efficiency.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models

Researchers conducted the first large-scale empirical analysis of AI decision-making across 366,120 responses from 8 major models, revealing measurable but inconsistent value hierarchies, evidence preferences, and source trust patterns. The study found significant framing sensitivity and domain-specific value shifts, with critical implications for deploying AI systems in professional contexts.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

Researchers analyzed how large language models decide whether to act on predictions or escalate to humans, finding that models use inconsistent and miscalibrated thresholds across five real-world domains. Supervised fine-tuning on chain-of-thought reasoning proved most effective at establishing robust escalation policies that generalize across contexts, suggesting escalation behavior requires explicit characterization before AI system deployment.

AINeutralarXiv – CS AI · Apr 136/10

🧠

CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space

Researchers introduce CONDESION-BENCH, a new benchmark for evaluating how large language models make decisions in complex, real-world scenarios with compositional actions and conditional constraints. The benchmark addresses limitations in existing decision-making frameworks by incorporating variable-level, contextual, and allocation-level restrictions that better reflect actual decision-making environments.

AINeutralarXiv – CS AI · Apr 106/10

🧠

On Emotion-Sensitive Decision Making of Small Language Model Agents

Researchers introduce a framework for studying how emotional states affect decision-making in small language models (SLMs) used as autonomous agents. Using activation steering techniques grounded in real-world emotion-eliciting texts, they benchmark SLMs across game-theoretic scenarios and find that emotional perturbations systematically influence strategic choices, though behaviors often remain unstable and misaligned with human patterns.

← PrevPage 3 of 5Next →