#dialogue-systems News & Analysis

22 articles tagged with #dialogue-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles

AINeutralarXiv – CS AI · May 97/10

🧠

Beyond Fixed Psychological Personas: State Beats Trait, but Language Models are State-Blind

Researchers introduce Chameleon, a dataset of 5,001 contextual psychological profiles revealing that 74% of user behavior variance stems from situational context (state) rather than personality traits (26%). The study finds language models are state-blind, responding similarly regardless of context, while reward models inconsistently evaluate the same users differently across scenarios.

AINeutralarXiv – CS AI · Apr 137/10

🧠

SAGE: A Service Agent Graph-guided Evaluation Benchmark

Researchers introduce SAGE, a comprehensive benchmark for evaluating Large Language Models in customer service automation that uses dynamic dialogue graphs and adversarial testing to assess both intent classification and action execution. Testing across 27 LLMs reveals a critical 'Execution Gap' where models correctly identify user intents but fail to perform appropriate follow-up actions, plus an 'Empathy Resilience' phenomenon where models maintain polite facades despite underlying logical failures.

AIBullisharXiv – CS AI · Mar 56/10

🧠

AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents

Researchers have developed AriadneMem, a new memory system for long-horizon LLM agents that addresses challenges in maintaining accurate memory under fixed context budgets. The system uses a two-phase pipeline with entropy-aware gating and conflict-aware coarsening to improve multi-hop reasoning while reducing runtime by 77.8% and using only 497 context tokens.

🧠 GPT-4

AIBullisharXiv – CS AI · Mar 56/10

🧠

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Researchers introduce DIALEVAL, a new automated framework that uses dual LLM agents to evaluate how well AI models follow instructions. The system achieves 90.38% accuracy by breaking down instructions into verifiable components and applying type-specific evaluation criteria, showing 26.45% error reduction over existing methods.

AIBullisharXiv – CS AI · Mar 47/103

🧠

ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Researchers developed ATPO (Adaptive Tree Policy Optimization), a new AI algorithm for multi-turn medical dialogues that outperforms existing methods by better handling uncertainty in patient-doctor interactions. The algorithm enabled a smaller Qwen3-8B model to surpass GPT-4o's accuracy by 0.92% on medical dialogue benchmarks through improved value estimation and exploration strategies.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

A Survey on Recent Advances in Conversational Data Generation

A comprehensive survey examines recent advances in synthetic dialogue data generation for conversational AI systems, addressing the challenge of data scarcity in training. The research categorizes methods across open-domain, task-oriented, and information-seeking dialogue systems, proposing a framework for generating multi-turn conversations at scale while maintaining quality standards.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

S-MARC: Causal Streaming Reasoning for Full-Duplex Conversational Behavior Modeling

Researchers introduce S-MARC, a streaming framework for modeling conversational behavior in full-duplex dialogue systems that predicts communicative functions and interaction behaviors while capturing their causal relationships. The system generates interpretable reasoning chains and establishes benchmarks for conversational AI reasoning, advancing natural human-computer interaction capabilities.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations

ESC-Skills introduces a novel framework for emotional support conversation systems that moves beyond end-to-end generation to create interpretable, executable skills. The system discovers support interventions from successful and failed dialogues, organizes them into a skills bank with applicability conditions and risk assessments, then self-improves through multi-profile simulations and systematic failure analysis.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

Researchers introduce ORBIT, a reinforcement learning framework that uses dynamically generated rubrics to fine-tune large language models for open-ended medical dialogue tasks. The approach achieves state-of-the-art performance on medical benchmarks with minimal training data, addressing the challenge of applying RL to complex tasks where traditional scalar reward signals are inadequate.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

Researchers propose Calibrated Interactive RL, a framework addressing distribution shift problems in multi-turn dialogue systems by combining interactive reinforcement learning with simulator alignment. The approach theoretically and empirically demonstrates that aligning simulators with human interaction patterns significantly improves LLM-based dialogue agent performance compared to static context and unaligned interactive methods.

AIBullisharXiv – CS AI · May 96/10

🧠

BALAR : A Bayesian Agentic Loop for Active Reasoning

Researchers introduced BALAR, a Bayesian algorithm that enables large language models to engage in structured multi-turn dialogue by actively reasoning about missing information and strategically asking clarifying questions. The system demonstrated significant performance improvements across three diverse benchmarks—14.6% to 38.5% higher accuracy—without requiring fine-tuning, suggesting a more principled approach to interactive AI reasoning.

AINeutralarXiv – CS AI · May 96/10

🧠

Flexible Agent Alignment with Goal Inference from Open-Ended Dialog

Researchers introduce Open-Universe Assistance Games (OU-AGs), a framework enabling LLM-based agents to infer and align with human preferences through open-ended dialogue. The GOOD method extracts evolving goals from natural language interactions using probabilistic inference, demonstrating improved user intent alignment across shopping, robotics, and coding domains without requiring large offline datasets.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Data Selection for Multi-turn Dialogue Instruction Tuning

Researchers propose MDS (Multi-turn Dialogue Selection), a framework for improving instruction-tuned language models by intelligently selecting high-quality multi-turn dialogue data. The method combines global coverage analysis with local structural evaluation to filter noisy datasets, demonstrating superior performance across multiple benchmarks compared to existing selection approaches.

AIBearisharXiv – CS AI · Mar 276/10

🧠

Probing the Lack of Stable Internal Beliefs in LLMs

Research reveals that large language models (LLMs) struggle to maintain consistent internal beliefs or goals across multi-turn conversations, failing to preserve implicit consistency when not explicitly provided context. This limitation poses significant challenges for developing persona-driven AI systems that require stable personality traits and behavioral patterns.

AIBullisharXiv – CS AI · Mar 266/10

🧠

MedAidDialog: A Multilingual Multi-Turn Medical Dialogue Dataset for Accessible Healthcare

Researchers have introduced MedAidDialog, a multilingual medical dialogue dataset covering seven languages, and developed MedAidLM, a conversational AI model for preliminary medical consultations. The system uses parameter-efficient fine-tuning on small language models to enable deployment without high-end computational infrastructure while incorporating patient context for personalized consultations.

AIBullisharXiv – CS AI · Mar 116/10

🧠

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

DuplexCascade introduces a VAD-free cascaded streaming pipeline that enables full-duplex speech-to-speech dialogue while maintaining LLM intelligence. The system converts traditional long utterance turns into micro-turn interactions using special control tokens to coordinate turn-taking and response timing.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AIBearisharXiv – CS AI · Mar 37/108

🧠

Extracting Training Dialogue Data from Large Language Model based Task Bots

Researchers have identified significant privacy risks in Large Language Model-based Task-Oriented Dialogue Systems, demonstrating that these AI systems can memorize and leak sensitive training data including phone numbers and complete dialogue exchanges. The study proposes new attack methods that can extract thousands of training dialogue states with over 70% precision in best-case scenarios.

$RNDR

AIBullisharXiv – CS AI · Mar 26/1013

🧠

LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning

Researchers propose an LLM-driven framework for generating multi-turn task-oriented dialogues to create more realistic reasoning benchmarks. The framework addresses limitations in current AI evaluation methods by producing synthetic datasets that better reflect real-world complexity and contextual coherence.

AIBullisharXiv – CS AI · Feb 276/105

🧠

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue

Researchers introduce InteractCS-RL, a new reinforcement learning framework that helps AI agents balance empathetic communication with cost-effective decision-making in task-oriented dialogue. The system uses a multi-granularity approach with persona-driven user interactions and cost-aware policy optimization to achieve better performance across business scenarios.

AINeutralApple Machine Learning · Feb 246/102

🧠

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Researchers introduce AMUSE, a new benchmark for evaluating multimodal large language models in multi-speaker dialogue scenarios. The framework addresses current limitations of models like GPT-4o in tracking speakers, maintaining conversational roles, and reasoning across audio-visual streams in applications such as conversational video assistants.

AINeutralarXiv – CS AI · Mar 54/10

🧠

A benchmark for joint dialogue satisfaction, emotion recognition, and emotion state transition prediction

Researchers have created a new multi-task Chinese dialogue dataset that enables prediction of user satisfaction, emotion recognition, and emotional state transitions across multiple conversation turns. The dataset addresses limitations in existing Chinese resources and aims to improve understanding of how user emotions evolve during interactions to better predict satisfaction.