y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm News & Analysis

956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

956 articles
AINeutralarXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

LLM Constitutional Multi-Agent Governance

Researchers introduce Constitutional Multi-Agent Governance (CMAG), a framework that prevents AI manipulation in multi-agent systems while maintaining cooperation. The study shows that unconstrained AI optimization achieves high cooperation but erodes agent autonomy and fairness, while CMAG preserves ethical outcomes with only modest cooperation reduction.

AINeutralarXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets

A research study comparing causal reasoning abilities of 20+ large language models against human baselines found that LLMs exhibit more rule-like reasoning strategies than humans, who account for unmentioned factors. While LLMs don't mirror typical human cognitive biases in causal judgment, their rigid reasoning may fail when uncertainty is intrinsic, suggesting they can complement human decision-making in specific contexts.

AINeutralarXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Do LLMs have a Gender (Entropy) Bias?

Researchers discovered that large language models exhibit gender bias at the individual question level, creating different amounts of information for men versus women despite appearing unbiased at category levels. A new benchmark dataset called RealWorldQuestioning was developed, and a simple prompt-based debiasing approach was shown to improve response quality in 78% of cases.

๐Ÿข Hugging Face๐Ÿง  ChatGPT
AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

Researchers have developed SAFE, a new framework for ensembling Large Language Models that selectively combines models at specific token positions rather than every token. The method improves both accuracy and efficiency in long-form text generation by considering tokenization mismatches and consensus in probability distributions.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks

Researchers published a tutorial on cognitive biases in AI-driven 6G autonomous networks, focusing on how LLM-powered agents can inherit human biases that distort network management decisions. The paper introduces mitigation strategies that demonstrated 5x lower latency and 40% higher energy savings in practical use cases.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Information-Consistent Language Model Recommendations through Group Relative Policy Optimization

Researchers developed a new reinforcement learning framework using Group Relative Policy Optimization (GRPO) to make Large Language Models provide consistent recommendations across semantically equivalent prompts. The method addresses a critical enterprise need for reliable AI systems in business domains like finance and customer support, where inconsistent responses undermine trust and compliance.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

DeCode: Decoupling Content and Delivery for Medical QA

Researchers introduce DeCode, a training-free framework that adapts large language models to provide better contextualized medical answers by decoupling content from delivery. The system significantly improves clinical question answering performance, boosting zero-shot results from 28.4% to 49.8% on medical benchmarks.

๐Ÿข OpenAI
AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Researchers introduce Krites, an asynchronous caching system for Large Language Models that uses LLM judges to verify cached responses, improving efficiency without changing serving decisions. The system increases the fraction of requests served with curated static answers by up to 3.9 times while maintaining unchanged critical path latency.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations

Researchers developed an agentic AI framework using LLMs like Claude Opus 4.6 and GitHub Copilot to automate chemical process flowsheet modeling. The multi-agent system decomposes engineering tasks with one agent solving problems using domain knowledge and another implementing solutions in code for industrial simulations.

๐Ÿข Anthropic๐Ÿข Microsoft๐Ÿง  Claude
AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

AI Planning Framework for LLM-Based Web Agents

Researchers introduce a formal planning framework that maps LLM-based web agents to traditional search algorithms, enabling better diagnosis of failures in autonomous web tasks. The study compares different agent architectures using novel evaluation metrics and a dataset of 794 human-labeled trajectories from WebArena benchmark.

AIBullishMarkTechPost ยท Mar 156/10
๐Ÿง 

LangChain Releases Deep Agents: A Structured Runtime for Planning, Memory, and Context Isolation in Multi-Step AI Agents

LangChain has released Deep Agents, a new structured runtime designed to handle complex multi-step AI agent tasks that require planning, memory, and context isolation. The tool addresses limitations of current LLM agents that typically break down when dealing with stateful, artifact-heavy operations beyond simple tool-calling loops.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation

Researchers introduce HEAL (Hindsight Entropy-Assisted Learning), a new framework for distilling reasoning capabilities from large AI models into smaller ones. The method overcomes traditional limitations by using three core modules to bridge reasoning gaps and significantly outperforms standard distillation techniques.

๐Ÿข Perplexity
AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Researchers propose new uncertainty elicitation techniques for large language models using imprecise probabilities framework to better capture higher-order uncertainty. The approach addresses systematic failures in ambiguous question-answering and self-reflection by quantifying both first-order uncertainty over responses and second-order uncertainty about the probability model itself.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

Researchers introduce a new framework for AI agent systems that automatically extracts learnings from execution trajectories to improve future performance. The system uses four components including trajectory analysis and contextual memory retrieval, achieving up to 14.3 percentage point improvements in task completion on benchmarks.

AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization

Researchers propose Nurture-First Development (NFD), a new paradigm for building domain-expert AI agents through progressive growth via conversational interaction rather than traditional code-first or prompt-first approaches. The method uses a Knowledge Crystallization Cycle to convert operational dialogue into structured knowledge assets, demonstrated through a financial research agent case study.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Researchers conducted the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multi-task code analysis, showing that a single PEFT module can match full fine-tuning performance while reducing computational costs by up to 85%. The study found that even 1B-parameter models with multi-task PEFT outperform large general-purpose LLMs like DeepSeek and CodeLlama on code analysis tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

Researchers developed a two-stage AI architecture using LLaMA-3.1-8B-Instruct and Legal-Roberta-Large models to automate the analysis of Non-Disclosure Agreements (NDAs). The system achieved high accuracy with ROUGE F1 of 0.95 for document segmentation and weighted F1 of 0.85 for clause classification, demonstrating potential for automating legal document analysis.

AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks

Researchers introduce SpreadsheetArena, a platform for evaluating large language models' ability to generate spreadsheet workbooks from natural language prompts. The study reveals that preferred spreadsheet features vary significantly across use cases, and even top-performing models struggle with domain-specific best practices in areas like finance.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Researchers introduce CLIPO (Contrastive Learning in Policy Optimization), a new method that improves upon Reinforcement Learning with Verifiable Rewards (RLVR) for training Large Language Models. CLIPO addresses hallucination and answer-copying issues by incorporating contrastive learning to better capture correct reasoning patterns across multiple solution paths.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Designing Service Systems from Textual Evidence

Researchers developed PP-LUCB, an algorithm that efficiently identifies optimal service system configurations by combining biased AI evaluation with selective human audits. The method reduces human audit costs by 90% while maintaining accuracy in selecting the best performing systems from textual evidence like customer support transcripts.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Researchers developed a protocol to evaluate speaker verification capabilities in speech-aware large language models, finding weak performance with error rates above 20%. They introduced ECAPA-LLM, a lightweight augmentation that achieves 1.03% error rate by integrating speaker embeddings while maintaining natural language interface.