🧠

AI

9,311 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

9311 articles

AIBullisharXiv – CS AI · Feb 275/106

🧠

Invariant Transformation and Resampling based Epistemic-Uncertainty Reduction

Researchers propose a new AI inference method that uses invariant transformations and resampling to reduce epistemic uncertainty and improve model accuracy. The approach involves applying multiple transformed versions of an input to a trained AI model and aggregating the outputs for more reliable results.

AIBullisharXiv – CS AI · Feb 276/106

🧠

Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection

Researchers propose a new approach using Adversarial Inverse Reinforcement Learning for machinery fault detection that learns from healthy operational data without requiring manual fault labels. The framework treats fault detection as a sequential decision-making problem and demonstrates effective early fault detection on three benchmark datasets.

AIBullisharXiv – CS AI · Feb 276/106

🧠

ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks

Researchers developed ODEBRAIN, a Neural ODE framework that models continuous-time EEG brain dynamics by integrating spatio-temporal-frequency features into spectral graph nodes. The system overcomes limitations of traditional discrete-time models by capturing instantaneous, nonlinear brain characteristics without cumulative prediction errors.

AINeutralarXiv – CS AI · Feb 276/103

🧠

CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

Researchers developed CXReasonAgent, a diagnostic AI agent that combines large language models with clinical diagnostic tools to provide evidence-based chest X-ray analysis. The system addresses limitations of current vision-language models that generate plausible but ungrounded medical diagnoses, introducing a new benchmark with 1,946 diagnostic dialogues.

AINeutralarXiv – CS AI · Feb 276/105

🧠

Evaluating Stochasticity in Deep Research Agents

Researchers identified stochasticity (variability) as a critical barrier to deploying Deep Research Agents in real-world applications like financial decision-making and medical analysis. The study proposes mitigation strategies that reduce output variance by 22% while maintaining research quality, addressing a key obstacle for enterprise AI agent adoption.

AIBullisharXiv – CS AI · Feb 276/106

🧠

ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering

Researchers have introduced ESAA (Event Sourcing for Autonomous Agents), a new architecture that improves LLM-based autonomous agents by separating cognitive intention from state mutation using structured JSON events and deterministic orchestration. The system addresses key limitations like context degradation and execution reliability, with successful validation through multi-agent case studies using various LLMs including Claude Sonnet and GPT-5.

AIBullisharXiv – CS AI · Feb 276/106

🧠

PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

Researchers have developed PATRA, a new AI model that improves time series question answering by better understanding patterns like trends and seasonality. The model addresses limitations in existing LLM approaches that treat time series data as simple text or images, introducing pattern-aware mechanisms and balanced learning across tasks of varying difficulty.

AINeutralarXiv – CS AI · Feb 276/107

🧠

ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

Researchers developed ReCoN-Ipsundrum, an AI agent architecture designed to exhibit consciousness-like behaviors through recurrent persistence loops and affect-coupled control mechanisms. The study demonstrates how engineered systems can display preference stability, exploratory scanning, and sustained caution behaviors that mimic aspects of conscious experience.

$LINK

AIBullisharXiv – CS AI · Feb 276/107

🧠

On Sample-Efficient Generalized Planning via Learned Transition Models

Researchers propose a new approach to generalized planning that learns explicit transition models rather than directly predicting action sequences. This method achieves better out-of-distribution performance with fewer training instances and smaller models compared to Transformer-based planners like PlanGPT.

AIBullisharXiv – CS AI · Feb 276/106

🧠

Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection

Researchers developed MALLET, a multi-agent AI system that reduces emotional intensity in news content by up to 19.3% while preserving semantic meaning. The system uses four specialized agents to analyze, adjust, and personalize content presentation modes for calmer decision-making without restricting access to original information.

$NEAR

AIBullisharXiv – CS AI · Feb 275/107

🧠

RepSPD: Enhancing SPD Manifold Representation in EEGs via Dynamic Graphs

Researchers have developed RepSPD, a novel geometric deep learning model that enhances EEG brain activity decoding using symmetric positive definite manifolds and dynamic graphs. The framework introduces cross-attention mechanisms on Riemannian manifolds and bidirectional alignment strategies to improve brain signal representation and analysis.

AIBullisharXiv – CS AI · Feb 276/107

🧠

Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

Researchers developed a framework for analyzing AI diagnostic systems in clinical settings by preserving original AI inferences and comparing them with physician corrections. The study of 21 dermatological cases showed 71.4% exact agreement between AI and physicians, with 100% comprehensive concordance when using structured analysis methods.

AINeutralarXiv – CS AI · Feb 276/107

🧠

SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy

Researchers have developed SPM-Bench, a PhD-level benchmark for testing large language models on scanning probe microscopy tasks. The benchmark uses automated data synthesis from scientific papers and introduces new evaluation metrics to assess AI reasoning capabilities in specialized scientific domains.

AIBullisharXiv – CS AI · Feb 276/108

🧠

FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

Researchers have developed FactGuard, an AI framework that uses multimodal large language models and reinforcement learning to detect video misinformation. The system addresses limitations of existing models by implementing iterative reasoning processes and external tool integration to verify information across video content.

AINeutralarXiv – CS AI · Feb 276/106

🧠

The AI Research Assistant: Promise, Peril, and a Proof of Concept

Researchers published a case study demonstrating successful human-AI collaboration in mathematical research, extending Hermite quadrature rule results beyond manual capabilities. The study reveals AI's strengths in algebraic manipulation and proof exploration, while highlighting the critical need for human verification and domain expertise in every step of the research process.

AIBullisharXiv – CS AI · Feb 275/107

🧠

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

DeepPresenter is a new AI framework for autonomous presentation generation that can plan, render, and revise slides through environment-grounded reflection rather than fixed templates. The system uses perceptual feedback from rendered slides to identify and correct presentation-specific issues, achieving state-of-the-art performance with a competitive 9B parameter model.

AIBearisharXiv – CS AI · Feb 276/107

🧠

ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making

Researchers developed ClinDet-Bench, a new benchmark that reveals large language models fail to properly identify when they have sufficient information to make clinical decisions. The study shows LLMs make both premature judgments and excessive abstentions in medical scenarios, highlighting safety concerns for AI deployment in healthcare settings.

AIBullisharXiv – CS AI · Feb 276/107

🧠

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Researchers introduce AMA-Bench, a new benchmark for evaluating long-horizon memory in AI agents deployed in real-world applications. The study reveals existing memory systems underperform due to lack of causality and objective information, while their proposed AMA-Agent system achieves 57.22% accuracy, surpassing baselines by 11.16%.

AINeutralarXiv – CS AI · Feb 276/105

🧠

Decomposing Physician Disagreement in HealthBench

Research analyzing physician disagreement in HealthBench medical AI evaluation dataset finds that 81.8% of disagreement variance is unexplained by observable features, with rubric identity accounting for only 15.8% of variance. The study reveals physicians agree on clearly good or bad AI outputs but disagree on borderline cases, suggesting structural limits to medical AI evaluation consistency.

AIBullisharXiv – CS AI · Feb 276/105

🧠

Generative Data Transformation: From Mixed to Unified Data

Researchers propose TAESAR, a new data-centric framework for improving recommendation models by transforming mixed-domain data into unified target-domain sequences. The approach uses contrastive decoding to address domain gaps and data sparsity issues, outperforming traditional model-centric solutions while generalizing across various sequential models.

AIBullisharXiv – CS AI · Feb 276/106

🧠

RLHFless: Serverless Computing for Efficient RLHF

Researchers introduce RLHFless, a serverless computing framework for Reinforcement Learning from Human Feedback (RLHF) that addresses resource inefficiencies in training large language models. The system achieves up to 1.35x speedup and 44.8% cost reduction compared to existing solutions by dynamically adapting to resource demands and optimizing workload distribution.

AIBullisharXiv – CS AI · Feb 276/106

🧠

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Researchers introduce SideQuest, a novel KV cache management system that uses Large Reasoning Models to compress memory usage during long-horizon AI tasks. The system reduces peak token usage by up to 65% while maintaining accuracy by having the model itself determine which tokens are useful to keep in memory.

AIBullisharXiv – CS AI · Feb 276/104

🧠

Agentic AI for Intent-driven Optimization in Cell-free O-RAN

Researchers propose an agentic AI framework using multiple LLM-based agents to optimize cell-free Open RAN networks through intent-driven automation. The system reduces active radio units by 42% in energy-saving mode while cutting memory usage by 92% through parameter-efficient fine-tuning.

AINeutralarXiv – CS AI · Feb 275/102

🧠

Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents

Researchers propose using cognitive models and AI algorithms as templates for designing modular language agents that combine multiple large language models. The position paper formalizes agent templates that specify roles for individual LLMs and how their functionalities should be composed to solve complex problems beyond single model capabilities.

AIBullisharXiv – CS AI · Feb 276/107

🧠

Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention

Researchers introduce AHCE (Active Human-Augmented Challenge Engagement), a framework that enables AI agents to collaborate with human experts more effectively through learned policies. The system achieved 32% improvement on normal difficulty tasks and 70% on difficult tasks in Minecraft experiments by treating humans as interactive reasoning tools rather than simple help sources.

← PrevPage 186 of 373Next →