🧠

AI

13,322 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

13322 articles

AIBullisharXiv – CS AI · Feb 276/102

🧠

Retrieval-Augmented Generation Assistant for Anatomical Pathology Laboratories

Researchers developed a Retrieval-Augmented Generation (RAG) assistant for anatomical pathology laboratories to replace outdated static documentation with dynamic, searchable protocol guidance. The system achieved strong performance using biomedical-specific embeddings and could transform healthcare laboratory workflows by providing technicians with accurate, context-grounded answers to protocol queries.

AINeutralarXiv – CS AI · Feb 276/105

🧠

Evaluating Stochasticity in Deep Research Agents

Researchers identified stochasticity (variability) as a critical barrier to deploying Deep Research Agents in real-world applications like financial decision-making and medical analysis. The study proposes mitigation strategies that reduce output variance by 22% while maintaining research quality, addressing a key obstacle for enterprise AI agent adoption.

AINeutralarXiv – CS AI · Feb 276/103

🧠

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

Researchers developed CXReasonAgent, a diagnostic AI agent that combines large language models with clinical diagnostic tools to provide evidence-based chest X-ray analysis. The system addresses limitations of current vision-language models that generate plausible but ungrounded medical diagnoses, introducing a new benchmark with 1,946 diagnostic dialogues.

AIBullisharXiv – CS AI · Feb 276/106

🧠

PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

Researchers have developed PATRA, a new AI model that improves time series question answering by better understanding patterns like trends and seasonality. The model addresses limitations in existing LLM approaches that treat time series data as simple text or images, introducing pattern-aware mechanisms and balanced learning across tasks of varying difficulty.

AIBullisharXiv – CS AI · Feb 276/106

🧠

ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering

Researchers have introduced ESAA (Event Sourcing for Autonomous Agents), a new architecture that improves LLM-based autonomous agents by separating cognitive intention from state mutation using structured JSON events and deterministic orchestration. The system addresses key limitations like context degradation and execution reliability, with successful validation through multi-agent case studies using various LLMs including Claude Sonnet and GPT-5.

AIBullisharXiv – CS AI · Feb 276/107

🧠

On Sample-Efficient Generalized Planning via Learned Transition Models

Researchers propose a new approach to generalized planning that learns explicit transition models rather than directly predicting action sequences. This method achieves better out-of-distribution performance with fewer training instances and smaller models compared to Transformer-based planners like PlanGPT.

AIBullisharXiv – CS AI · Feb 276/106

🧠

Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection

Researchers developed MALLET, a multi-agent AI system that reduces emotional intensity in news content by up to 19.3% while preserving semantic meaning. The system uses four specialized agents to analyze, adjust, and personalize content presentation modes for calmer decision-making without restricting access to original information.

$NEAR

AINeutralarXiv – CS AI · Feb 276/107

🧠

ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

Researchers developed ReCoN-Ipsundrum, an AI agent architecture designed to exhibit consciousness-like behaviors through recurrent persistence loops and affect-coupled control mechanisms. The study demonstrates how engineered systems can display preference stability, exploratory scanning, and sustained caution behaviors that mimic aspects of conscious experience.

$LINK

AIBullisharXiv – CS AI · Feb 276/105

🧠

Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications

Researchers developed improved neural retriever-reranker pipelines for Retrieval-Augmented Generation (RAG) systems over knowledge graphs in e-commerce applications. The study achieved 20.4% higher Hit@1 and 14.5% higher Mean Reciprocal Rank compared to existing benchmarks, providing a framework for production-ready RAG systems.

AIBullisharXiv – CS AI · Feb 275/107

🧠

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

DeepPresenter is a new AI framework for autonomous presentation generation that can plan, render, and revise slides through environment-grounded reflection rather than fixed templates. The system uses perceptual feedback from rendered slides to identify and correct presentation-specific issues, achieving state-of-the-art performance with a competitive 9B parameter model.

AINeutralarXiv – CS AI · Feb 276/106

🧠

The AI Research Assistant: Promise, Peril, and a Proof of Concept

Researchers published a case study demonstrating successful human-AI collaboration in mathematical research, extending Hermite quadrature rule results beyond manual capabilities. The study reveals AI's strengths in algebraic manipulation and proof exploration, while highlighting the critical need for human verification and domain expertise in every step of the research process.

AINeutralarXiv – CS AI · Feb 276/108

🧠

When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design

Researchers propose a new conceptual model for agentic AI systems that addresses when and how AI should intervene by integrating Scene, Context, and Human Behavior Factors. The model derives five design principles to guide AI intervention timing, depth, and restraint for more contextually sensitive autonomous systems.

AINeutralarXiv – CS AI · Feb 276/105

🧠

Decomposing Physician Disagreement in HealthBench

Research analyzing physician disagreement in HealthBench medical AI evaluation dataset finds that 81.8% of disagreement variance is unexplained by observable features, with rubric identity accounting for only 15.8% of variance. The study reveals physicians agree on clearly good or bad AI outputs but disagree on borderline cases, suggesting structural limits to medical AI evaluation consistency.

AIBullisharXiv – CS AI · Feb 276/108

🧠

FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

Researchers have developed FactGuard, an AI framework that uses multimodal large language models and reinforcement learning to detect video misinformation. The system addresses limitations of existing models by implementing iterative reasoning processes and external tool integration to verify information across video content.

AIBullisharXiv – CS AI · Feb 276/107

🧠

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Researchers introduce AMA-Bench, a new benchmark for evaluating long-horizon memory in AI agents deployed in real-world applications. The study reveals existing memory systems underperform due to lack of causality and objective information, while their proposed AMA-Agent system achieves 57.22% accuracy, surpassing baselines by 11.16%.

AIBearisharXiv – CS AI · Feb 276/107

🧠

ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making

Researchers developed ClinDet-Bench, a new benchmark that reveals large language models fail to properly identify when they have sufficient information to make clinical decisions. The study shows LLMs make both premature judgments and excessive abstentions in medical scenarios, highlighting safety concerns for AI deployment in healthcare settings.

AIBullisharXiv – CS AI · Feb 276/105

🧠

Generative Data Transformation: From Mixed to Unified Data

Researchers propose TAESAR, a new data-centric framework for improving recommendation models by transforming mixed-domain data into unified target-domain sequences. The approach uses contrastive decoding to address domain gaps and data sparsity issues, outperforming traditional model-centric solutions while generalizing across various sequential models.

AIBullisharXiv – CS AI · Feb 276/106

🧠

RLHFless: Serverless Computing for Efficient RLHF

Researchers introduce RLHFless, a serverless computing framework for Reinforcement Learning from Human Feedback (RLHF) that addresses resource inefficiencies in training large language models. The system achieves up to 1.35x speedup and 44.8% cost reduction compared to existing solutions by dynamically adapting to resource demands and optimizing workload distribution.

AIBullisharXiv – CS AI · Feb 276/107

🧠

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

Researchers identified why AI mathematical reasoning guidance is inconsistent and developed Selective Strategy Retrieval (SSR), a framework that improves AI math performance by combining human and model strategies. The method showed significant improvements of up to 13 points on mathematical benchmarks by addressing the gap between strategy usage and executability.

AINeutralarXiv – CS AI · Feb 276/104

🧠

Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach

Researchers propose using psychometric modeling to correct systematic biases in human evaluations of AI systems, demonstrating how Item Response Theory can separate true AI output quality from rater behavior inconsistencies. The approach was tested on OpenAI's summarization dataset and showed improved reliability in measuring AI model performance.

AIBullisharXiv – CS AI · Feb 276/104

🧠

Agentic AI for Intent-driven Optimization in Cell-free O-RAN

Researchers propose an agentic AI framework using multiple LLM-based agents to optimize cell-free Open RAN networks through intent-driven automation. The system reduces active radio units by 42% in energy-saving mode while cutting memory usage by 92% through parameter-efficient fine-tuning.

AINeutralarXiv – CS AI · Feb 275/102

🧠

Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents

Researchers propose using cognitive models and AI algorithms as templates for designing modular language agents that combine multiple large language models. The position paper formalizes agent templates that specify roles for individual LLMs and how their functionalities should be composed to solve complex problems beyond single model capabilities.

AIBullisharXiv – CS AI · Feb 276/106

🧠

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Researchers introduce SideQuest, a novel KV cache management system that uses Large Reasoning Models to compress memory usage during long-horizon AI tasks. The system reduces peak token usage by up to 65% while maintaining accuracy by having the model itself determine which tokens are useful to keep in memory.

AIBullisharXiv – CS AI · Feb 276/107

🧠

Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention

Researchers introduce AHCE (Active Human-Augmented Challenge Engagement), a framework that enables AI agents to collaborate with human experts more effectively through learned policies. The system achieved 32% improvement on normal difficulty tasks and 70% on difficult tasks in Minecraft experiments by treating humans as interactive reasoning tools rather than simple help sources.

AIBullisharXiv – CS AI · Feb 275/104

🧠

Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models

Researchers conducted a comprehensive review of artificial intelligence applications in life cycle assessment (LCA) using large language models to analyze trends and patterns. The study found dramatic growth in AI adoption for environmental assessments, with a notable shift toward LLM-driven approaches and strong correlations between AI methods and LCA stages.

← PrevPage 258 of 533Next →