🧠

AI

21,466 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21466 articles

AIBullisharXiv – CS AI · Mar 36/108

🧠

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Researchers introduce Mix-GRM, a new framework for Generative Reward Models that improves AI evaluation by combining breadth and depth reasoning mechanisms. The system achieves 8.2% better performance than leading open-source models by using structured Chain-of-Thought reasoning tailored to specific task types.

AINeutralarXiv – CS AI · Mar 36/1012

🧠

RubricBench: Aligning Model-Generated Rubrics with Human Standards

RubricBench is a new benchmark with 1,147 pairwise comparisons designed to evaluate rubric-based assessment methods for Large Language Models. Research reveals a significant gap between human-annotated and AI-generated rubrics, showing that current state-of-the-art models struggle to autonomously create valid evaluation criteria.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Benchmarking LLM Summaries of Multimodal Clinical Time Series for Remote Monitoring

Researchers developed an event-based evaluation framework for LLM-generated clinical summaries of remote monitoring data, revealing that models with high semantic similarity often fail to capture clinically significant events. A vision-based approach using time-series visualizations achieved the best clinical event alignment with 45.7% abnormality recall.

$NEAR

AIBullisharXiv – CS AI · Mar 37/107

🧠

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Researchers have developed CT-Flow, an AI framework that mimics how radiologists actually work by using tools interactively to analyze 3D CT scans. The system achieved 41% better diagnostic accuracy than existing models and 95% success in autonomous tool use, potentially revolutionizing clinical radiology workflows.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AIBullisharXiv – CS AI · Mar 37/106

🧠

CeProAgents: A Hierarchical Agents System for Automated Chemical Process Development

Researchers propose CeProAgents, a hierarchical multi-agent system that automates chemical process development using AI agents specialized in knowledge, concept, and parameter tasks. The system introduces CeProBench, a comprehensive benchmark for evaluating AI capabilities in chemical engineering applications.

AIBullisharXiv – CS AI · Mar 36/106

🧠

S5-HES Agent: Society 5.0-driven Agentic Framework to Democratize Smart Home Environment Simulation

Researchers have developed S5-HES Agent, an AI-driven framework that democratizes smart home research by enabling natural language configuration of simulations without programming expertise. The system uses large language models and retrieval-augmented generation to make smart home environment testing accessible to broader research communities beyond traditional technical experts.

$NEAR

AIBullisharXiv – CS AI · Mar 36/1010

🧠

Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study

DoorDash developed an AI system that uses multiple data sources to better understand ambiguous search queries by combining catalog data with web search results. The system achieved significant accuracy improvements over traditional methods and is now deployed across 95% of DoorDash's daily search traffic.

AINeutralarXiv – CS AI · Mar 37/106

🧠

ProtRLSearch: A Multi-Round Multimodal Protein Search Agent with Large Language Models Trained via Reinforcement Learning

Researchers introduce ProtRLSearch, a multi-round protein search agent that uses reinforcement learning and multimodal inputs (protein sequences and text) to improve protein analysis for healthcare applications. The system addresses limitations of single-round, text-only protein search agents and includes a new benchmark called ProtMCQs with 3,000 multiple choice questions for evaluation.

AIBullisharXiv – CS AI · Mar 37/107

🧠

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

Researchers have developed a new framework that combines Large Language Models (LLMs) with Deep Reinforcement Learning to improve data efficiency, interpretability, and cross-environment transferability. The approach uses LLMs to map natural language instructions into executable rules and create semantically annotated options for better skill reuse and constraint monitoring.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning

Researchers propose EfficientZero-Multitask (EZ-M), a multi-task model-based reinforcement learning algorithm that scales the number of tasks rather than samples per task for robotics training. The approach achieves state-of-the-art performance on HumanoidBench with significantly higher sample efficiency by leveraging shared world models across diverse tasks.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification

Researchers introduce MERA (Multimodal Mixture-of-Experts with Retrieval Augmentation), a new AI framework for protein active site identification that addresses challenges in drug discovery. The system achieves 90% AUPRC performance on active site prediction through hierarchical multi-expert retrieval and reliability-aware fusion strategies.

AINeutralarXiv – CS AI · Mar 37/108

🧠

Decoding Answers Before Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering

New research reveals that large language models often determine their final answers before generating chain-of-thought reasoning, challenging the assumption that CoT reflects the model's actual decision process. Linear probes can predict model answers with 0.9 AUC accuracy before CoT generation, and steering these activations can flip answers in over 50% of cases.

AIBullisharXiv – CS AI · Mar 36/107

🧠

SciDER: Scientific Data-centric End-to-end Researcher

Researchers have introduced SciDER, an AI-powered system that automates the entire scientific research process from data analysis to hypothesis generation and code execution. The system uses specialized AI agents that can collaboratively process raw experimental data and outperforms existing general-purpose AI models in scientific discovery tasks.

AIBullisharXiv – CS AI · Mar 37/107

🧠

MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning

Researchers propose MIST-RL, a reinforcement learning framework that improves AI code generation by creating more efficient test suites. The method achieves 28.5% higher fault detection while using 19.3% fewer test cases, demonstrating significant improvements in AI code verification efficiency.

AIBullisharXiv – CS AI · Mar 36/109

🧠

The Observer-Situation Lattice: A Unified Formal Basis for Perspective-Aware Cognition

Researchers introduce the Observer-Situation Lattice (OSL), a unified mathematical framework for autonomous agents to reason about multiple perspectives in complex environments. The system addresses limitations in current AI approaches by providing a single coherent structure for belief management and Theory of Mind reasoning.

AIBullisharXiv – CS AI · Mar 37/106

🧠

GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning

GraphScout is a new AI framework that enables smaller language models to autonomously explore knowledge graphs for reasoning tasks. The system allows a 4B parameter model to outperform much larger models by 16.7% while using fewer computational resources.

AIBullisharXiv – CS AI · Mar 36/108

🧠

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is a new AI framework that automates single-cell perturbation modeling by addressing data inconsistencies across different biological datasets. The system uses LLM-driven semantic unification and adaptive Monte Carlo Tree Search to achieve 95% execution rates on heterogeneous datasets while matching expert-designed baselines.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Pharmacology Knowledge Graphs: Do We Need Chemical Structure for Drug Repurposing?

Researchers developed a pharmacology knowledge graph for drug repurposing and found that removing chemical structure representations improved performance while dramatically reducing computational requirements. The study showed that drug behavior can be accurately predicted using only target protein information and network topology, with larger datasets proving more valuable than complex models.

AIBullisharXiv – CS AI · Mar 37/108

🧠

Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents

Researchers propose a training-free paradigm for empowering Vision-Language Models with multi-modal search capabilities through cross-modal model merging. The approach uses Optimal Brain Merging (OBM) to combine text-based search agents with base VLMs without requiring expensive supervised training or reinforcement learning.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

Researchers introduce ROSA2, a framework that improves Large Language Model interactions by simultaneously optimizing both prompts and model parameters during test-time adaptation. The approach outperformed baselines by 30% on mathematical tasks while reducing interaction turns by 40%.

AINeutralarXiv – CS AI · Mar 36/108

🧠

ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context

Researchers released ASTRA-bench, a new benchmark for evaluating AI agents' ability to handle complex, multi-step reasoning with personal context and tool usage. Testing revealed that current state-of-the-art models like Claude-4.5-Opus and DeepSeek-V3.2 show significant performance degradation in high-complexity scenarios.

AIBullisharXiv – CS AI · Mar 36/1012

🧠

Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents

Researchers developed Self-Healing Router, a fault-tolerant system for LLM agents that reduces control-plane LLM calls by 93% while maintaining correctness. The system uses graph-based routing with automatic recovery mechanisms, treating agent decisions as routing problems rather than reasoning tasks.

$COMP

AIBullisharXiv – CS AI · Mar 36/109

🧠

Information-Theoretic Framework for Self-Adapting Model Predictive Controllers

Researchers introduced Entanglement Learning (EL), an information-theoretic framework that enhances Model Predictive Control (MPC) for autonomous systems like UAVs. The framework uses an Information Digital Twin to monitor information flow and enable real-time adaptive optimization, improving MPC reliability beyond traditional error-based feedback systems.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Beyond Reward: A Bounded Measure of Agent Environment Coupling

Researchers introduce 'bipredictability' as a new metric to monitor reinforcement learning agents in real-world deployments, measuring interaction effectiveness through shared information ratios. The Information Digital Twin (IDT) system detects 89.3% of perturbations versus 44% for traditional reward-based monitoring, with 4.4x faster detection speed.

← PrevPage 563 of 859Next →