🧠

AI

21,469 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21469 articles

AIBullisharXiv – CS AI · Mar 37/107

🧠

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

Researchers have developed a new framework that combines Large Language Models (LLMs) with Deep Reinforcement Learning to improve data efficiency, interpretability, and cross-environment transferability. The approach uses LLMs to map natural language instructions into executable rules and create semantically annotated options for better skill reuse and constraint monitoring.

AINeutralarXiv – CS AI · Mar 37/106

🧠

ProtRLSearch: A Multi-Round Multimodal Protein Search Agent with Large Language Models Trained via Reinforcement Learning

Researchers introduce ProtRLSearch, a multi-round protein search agent that uses reinforcement learning and multimodal inputs (protein sequences and text) to improve protein analysis for healthcare applications. The system addresses limitations of single-round, text-only protein search agents and includes a new benchmark called ProtMCQs with 3,000 multiple choice questions for evaluation.

AIBullisharXiv – CS AI · Mar 36/108

🧠

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is a new AI framework that automates single-cell perturbation modeling by addressing data inconsistencies across different biological datasets. The system uses LLM-driven semantic unification and adaptive Monte Carlo Tree Search to achieve 95% execution rates on heterogeneous datasets while matching expert-designed baselines.

AINeutralarXiv – CS AI · Mar 37/109

🧠

The Lattice Representation Hypothesis of Large Language Models

Researchers propose the Lattice Representation Hypothesis, a new framework showing how large language models encode symbolic reasoning through geometric structures. The theory unifies continuous neural representations with formal logic by demonstrating that LLM embeddings naturally form concept lattices that enable symbolic operations through geometric intersections and unions.

AINeutralarXiv – CS AI · Mar 37/107

🧠

How Well Does Agent Development Reflect Real-World Work?

A research study analyzing 43 AI agent benchmarks and 72,342 tasks reveals significant misalignment between current agent development efforts and real-world human work patterns across 1,016 U.S. occupations. The study finds that agent development is overly programming-centric compared to where human labor and economic value are actually concentrated in the economy.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Semantic XPath: Structured Agentic Memory Access for Conversational AI

Researchers have developed Semantic XPath, a tree-structured memory system for conversational AI that improves performance by 176.7% over traditional methods while using only 9.1% of the tokens. The system addresses scalability issues in long-term AI conversations by efficiently accessing and updating structured memory instead of appending growing conversation history.

AIBullisharXiv – CS AI · Mar 36/1010

🧠

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent

Researchers have released DeepResearch-9K, a large-scale dataset with 9,000 questions across three difficulty levels designed to train and benchmark AI research agents. The accompanying open-source framework DeepResearch-R1 supports multi-turn web interactions and reinforcement learning approaches for developing more sophisticated AI research capabilities.

AIBullisharXiv – CS AI · Mar 36/107

🧠

AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution

AutoSkill is a new framework that enables AI language models to learn and reuse personalized skills from user interactions without retraining the underlying model. The system abstracts user preferences into reusable capabilities that can be shared across different agents and tasks, addressing the current limitation where LLMs fail to retain personalized learning between sessions.

AIBullisharXiv – CS AI · Mar 36/109

🧠

Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms

Researchers developed a method to generate 'alien' research directions by decomposing academic papers into 'idea atoms' and using AI models to identify coherent but non-obvious research paths. The system analyzes ~7,500 machine learning papers to find viable research directions that current researchers are unlikely to naturally propose.

AIBullisharXiv – CS AI · Mar 37/106

🧠

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

Researchers have released MMCOMET, the first large-scale multimodal commonsense knowledge graph that combines visual and textual information with over 900K multimodal triples. The system extends existing knowledge graphs to support complex AI reasoning tasks like image captioning and visual storytelling, demonstrating improved contextual understanding compared to text-only approaches.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Researchers found that AI agents perform better when their training data matches their deployment environment, specifically regarding interpreter state persistence. Models trained with persistent state but deployed in stateless environments trigger errors in 80% of cases, while the reverse wastes 3.5x more tokens through redundant computations.

AIBullisharXiv – CS AI · Mar 36/108

🧠

FCN-LLM: Empower LLM for Brain Functional Connectivity Network Understanding via Graph-level Multi-task Instruction Tuning

Researchers have developed FCN-LLM, a framework that enables Large Language Models to understand brain functional connectivity networks from fMRI scans through multi-task instruction tuning. The system uses a multi-scale encoder to capture brain features and demonstrates strong zero-shot generalization across unseen datasets, outperforming conventional supervised models.

AINeutralarXiv – CS AI · Mar 37/108

🧠

DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage

Researchers have developed DIVA-GRPO, a new reinforcement learning method that improves multimodal large language model reasoning by adaptively adjusting problem difficulty distributions. The approach addresses key limitations in existing group relative policy optimization methods, showing superior performance across six reasoning benchmarks.

AIBullisharXiv – CS AI · Mar 36/108

🧠

CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

Researchers propose CollabEval, a new multi-agent framework for evaluating AI-generated content that uses collaborative judgment instead of single LLM evaluation. The system implements a three-phase process with multiple AI agents working together to provide more consistent and less biased evaluations than current approaches.

AINeutralarXiv – CS AI · Mar 36/107

🧠

MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains

Researchers introduce MC-Search, the first benchmark for evaluating agentic multimodal retrieval-augmented generation (MM-RAG) systems with long, structured reasoning chains. The benchmark reveals systematic issues in current multimodal large language models and introduces Search-Align, a training framework that improves planning and retrieval accuracy.

AIBullisharXiv – CS AI · Mar 37/107

🧠

MetaMind: General and Cognitive World Models in Multi-Agent Systems by Meta-Theory of Mind

Meta researchers introduced MetaMind, a cognitive world model for multi-agent systems that enables agents to understand and predict other agents' behaviors without centralized supervision or communication. The system uses a meta-theory of mind framework allowing agents to reason about goals and beliefs of others through self-reflective learning and analogical reasoning.

AIBullisharXiv – CS AI · Mar 36/107

🧠

BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning

Researchers developed BioProAgent, a neuro-symbolic AI framework that combines large language models with deterministic constraints to enable reliable scientific planning in wet-lab environments. The system achieves 95.6% physical compliance compared to 21.0% for existing methods by using finite state machines to prevent costly experimental failures.

AIBearisharXiv – CS AI · Mar 37/108

🧠

The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents

Researchers introduced the Synthetic Web Benchmark, revealing that frontier AI language models fail catastrophically when exposed to high-plausibility misinformation in search results. The study shows current AI agents struggle to handle conflicting information sources, with accuracy collapsing despite access to truthful content.

AIBullisharXiv – CS AI · Mar 37/109

🧠

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

Researchers introduce HiMAC, a hierarchical reinforcement learning framework that improves LLM agent performance on long-horizon tasks by separating macro-level planning from micro-level execution. The approach demonstrates state-of-the-art results across multiple environments, showing that structured hierarchy is more effective than simply scaling model size for complex agent tasks.

AIBullisharXiv – CS AI · Mar 37/108

🧠

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Researchers propose MemPO (Self-Memory Policy Optimization), a new algorithm that enables AI agents to autonomously manage their memory during long-horizon tasks. The method achieves significant performance improvements with 25.98% F1 score gains over base models while reducing token usage by 67.58%.

AIBullisharXiv – CS AI · Mar 36/109

🧠

K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control

Researchers introduce K²-Agent, a hierarchical AI framework for mobile device control that separates 'know-what' and 'know-how' knowledge to achieve 76.1% success rate on AndroidWorld benchmark. The system uses a high-level reasoner for task planning and low-level executor for skill execution, showing strong generalization across different models and tasks.

AIBullisharXiv – CS AI · Mar 36/108

🧠

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Researchers introduce InfoPO (Information-Driven Policy Optimization), a new method that improves AI agent interactions by using information-gain rewards to identify valuable conversation turns. The approach addresses credit assignment problems in multi-turn interactions and outperforms existing baselines across diverse tasks including intent clarification and collaborative coding.

AINeutralarXiv – CS AI · Mar 36/108

🧠

Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

Researchers introduce IRIS Benchmark, the first comprehensive evaluation framework for measuring fairness in Unified Multimodal Large Language Models (UMLLMs) across both understanding and generation tasks. The benchmark integrates 60 granular metrics across three dimensions and reveals systemic bias issues in leading AI models, including 'generation gaps' and 'personality splits'.

AIBullisharXiv – CS AI · Mar 36/108

🧠

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

Researchers introduce MicroVerse, a specialized AI video generation model for microscale biological simulations, addressing limitations of current video generation models in scientific applications. The work includes MicroWorldBench benchmark and MicroSim-10K dataset, targeting biomedical applications like drug discovery and educational visualization.

AINeutralarXiv – CS AI · Mar 37/106

🧠

A Unified Framework to Quantify Cultural Intelligence of AI

Researchers have developed a unified framework to systematically measure the cultural intelligence of AI systems as generative AI technologies expand globally. The framework addresses the need for comprehensive assessment of AI's ability to operate across diverse cultural contexts, moving beyond fragmented evaluation approaches to provide a systematic methodology for measuring cultural competence.

← PrevPage 564 of 859Next →