🧠

AI

21,049 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21049 articles

AIBullisharXiv – CS AI · Mar 36/103

🧠

Knowledge Graph Augmented Large Language Models for Disease Prediction

Researchers developed a knowledge graph-guided chain-of-thought framework that uses large language models for disease prediction from electronic health records. The approach outperformed classical baselines and showed strong zero-shot transfer capabilities, with clinicians preferring the AI-generated explanations for their clarity and relevance.

AIBullisharXiv – CS AI · Mar 35/104

🧠

From Passive to Proactive: A Hierarchical Multi-Agent Framework for Automated Medical Pre-Consultation

Researchers developed a multi-agent AI system for medical triage that uses three specialized agents to improve patient classification accuracy. The system achieved 89.6% accuracy in primary department classification and 74.3% in secondary classification, addressing healthcare staffing shortages through automated pre-consultation.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Benchmarking Overton Pluralism in LLMs

Researchers introduced OVERTONBENCH, a framework for measuring viewpoint diversity in large language models through the OVERTONSCORE metric. In a study of 8 LLMs with 1,208 participants, models scored 0.35-0.41 out of 1.0, with DeepSeek V3 performing best, showing significant room for improvement in pluralistic representation.

AIBullisharXiv – CS AI · Mar 36/103

🧠

ScholarEval: Research Idea Evaluation Grounded in Literature

Researchers introduce ScholarEval, a retrieval-augmented framework for evaluating AI-generated research ideas based on soundness and contribution metrics. The system outperformed OpenAI's o1-mini-deep-research baseline across multiple evaluation criteria in testing with 117 expert-annotated research ideas across four scientific disciplines.

AIBearisharXiv – CS AI · Mar 36/104

🧠

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

Researchers introduced HardcoreLogic, a benchmark of over 5,000 logic puzzles across 10 games to test Large Reasoning Models (LRMs) on non-standard puzzle variants. The study reveals significant performance drops in current LRMs when faced with complex or uncommon puzzle variations, indicating heavy reliance on memorized patterns rather than genuine logical reasoning.

AINeutralarXiv – CS AI · Mar 36/104

🧠

Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems

Researchers analyzed bias in 6 large language models used as autonomous judges in communication systems, finding that while current LLM judges show robustness to biased inputs, fine-tuning on biased data significantly degrades performance. The study identified 11 types of judgment biases and proposed four mitigation strategies for fairer AI evaluation systems.

AINeutralarXiv – CS AI · Mar 36/103

🧠

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

Researchers introduce FaithCoT-Bench, the first comprehensive benchmark for detecting unfaithful Chain-of-Thought reasoning in large language models. The benchmark includes over 1,000 expert-annotated trajectories across four domains and evaluates eleven detection methods, revealing significant challenges in identifying unreliable AI reasoning processes.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Understanding the Role of Training Data in Test-Time Scaling

Research paper analyzes test-time scaling in large language models, revealing that longer reasoning chains (CoTs) can reduce training data requirements but may harm performance if relevant skills aren't present in training data. The study provides theoretical framework showing that diverse, relevant, and challenging training tasks optimize test-time scaling performance.

AINeutralarXiv – CS AI · Mar 36/104

🧠

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

Researchers introduced EHR-ChatQA, a new benchmark for testing AI agents that interact with Electronic Health Record databases through natural language queries. The benchmark reveals significant reliability gaps in current state-of-the-art LLMs, with success rates dropping substantially when consistency across multiple trials is required.

AIBearisharXiv – CS AI · Mar 36/104

🧠

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

Researchers introduced SciTrek, a new benchmark for testing large language models' ability to perform numerical reasoning across long scientific documents. The benchmark reveals significant challenges for current LLMs, with the best model achieving only 46.5% accuracy at 128K tokens, and performance declining as context length increases.

$COMP

AIBullisharXiv – CS AI · Mar 36/103

🧠

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

Researchers have developed State-aware Reasoning (StaR), a new multimodal AI method that significantly improves AI agents' ability to interact with graphical user interfaces, particularly with toggle controls. The method enables agents to better perceive current states and execute instructions accordingly, improving toggle execution accuracy by over 30%.

AIBullisharXiv – CS AI · Mar 36/104

🧠

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

Researchers introduce AIssistant, an open-source framework that combines human expertise with AI agents to streamline scientific review and perspective paper creation in data science. The system uses 15 specialized LLM-driven agents across two workflows and demonstrates 65.7% time savings while maintaining research quality through strategic human oversight.

AIBullisharXiv – CS AI · Mar 36/104

🧠

A Message Passing Realization of Expected Free Energy Minimization

Researchers developed a message passing approach for Expected Free Energy minimization that transforms complex combinatorial search problems into tractable inference problems. The method enables more efficient AI agent planning and exploration under uncertainty, outperforming conventional approaches in test environments.

AINeutralarXiv – CS AI · Mar 35/103

🧠

Behavioral Generative Agents for Energy Operations

Researchers developed behavioral generative agents powered by large language models to simulate consumer decision-making in energy operations. The study found these AI agents can model heterogeneous customer behavior and provide insights into rare events like blackouts, offering a scalable tool for energy policy analysis.

AIBullisharXiv – CS AI · Mar 36/103

🧠

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

Researchers have developed ViTSP, a framework that uses pre-trained vision language models to solve large-scale Traveling Salesman Problems with average optimality gaps of just 0.24%. The system outperforms existing learning-based methods and reduces gaps by 3.57% to 100% compared to the best heuristic solver LKH-3 on instances with over 10,000 nodes.

AIBullisharXiv – CS AI · Mar 36/105

🧠

Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

Researchers have developed Re4, a multi-agent AI framework that uses three specialized LLMs (Consultant, Reviewer, and Programmer) working collaboratively to solve scientific computing problems. The system employs a rewriting-resolution-review-revision process that significantly improves bug-free code generation and reduces non-physical solutions in mathematical and scientific reasoning tasks.

$LINK

AIBullisharXiv – CS AI · Mar 36/103

🧠

Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering

Researchers developed a meta-learning approach for Large Multimodal Models (LMMs) that uses distilled soft prompts to improve few-shot visual question answering performance. The method outperformed traditional in-context learning by 21.2% and parameter-efficient finetuning by 7.7% on VQA tasks.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

Researchers have developed a new preference learning framework that addresses bias in AI alignment by ensuring policies reflect true population distributions rather than just majority opinions. The approach uses social choice theory principles and has been validated on both recommendation tasks and large language model alignment.

AIBullisharXiv – CS AI · Mar 36/104

🧠

FAuNO: Semi-Asynchronous Federated Reinforcement Learning Framework for Task Offloading in Edge Systems

Researchers have developed FAuNO, a new federated reinforcement learning framework that uses asynchronous processing to optimize task distribution in edge computing networks. The system employs an actor-critic architecture where local nodes learn specific dynamics while a central critic coordinates overall system performance, demonstrating superior results in reducing latency and task loss compared to existing methods.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation

Researchers introduce BoxMed-RL, a new AI framework that uses chain-of-thought reasoning and reinforcement learning to generate spatially verifiable radiology reports. The system mimics radiologist workflows by linking visual findings to precise anatomical locations, achieving 7% improvement over existing methods in key performance metrics.

$LINK

AIBullisharXiv – CS AI · Mar 36/104

🧠

Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation

Researchers introduce BrainNav, a bio-inspired navigation framework that mimics biological spatial cognition to enhance Vision-and-Language Navigation in mobile robots. The system addresses spatial hallucination issues when transferring from simulation to real-world environments, demonstrating superior performance in zero-shot real-world testing.

AINeutralarXiv – CS AI · Mar 36/103

🧠

The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

Researchers identified 'internal bias' as a key cause of overthinking in AI reasoning models, where models form preliminary guesses that conflict with systematic reasoning. The study found that excessive attention to input questions triggers redundant reasoning steps, and current mitigation methods have proven ineffective.

AIBullisharXiv – CS AI · Mar 36/104

🧠

A Contemporary Overview: Trends and Applications of Large Language Models on Mobile Devices

Large language models (LLMs) are increasingly being deployed on mobile devices, enabling applications like voice assistants, real-time translation, and intelligent recommendations. Advancements in hardware and 5G infrastructure allow for efficient local inference while improving data privacy and reducing cloud dependency.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Theoretical Foundations of Superhypergraph and Plithogenic Graph Neural Networks

Researchers have developed theoretical foundations for SuperHyperGraph Neural Networks (SHGNNs) and Plithogenic Graph Neural Networks, extending traditional graph neural networks to handle complex hierarchical structures and multi-valued attributes. These advanced frameworks aim to better model uncertainty and higher-order interactions in complex networks beyond the capabilities of standard graph neural networks.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Token-Importance Guided Direct Preference Optimization

Researchers propose Token-Importance Guided Direct Preference Optimization (TI-DPO), a new framework for aligning Large Language Models with human preferences. The method uses hybrid weighting mechanisms and triplet loss to achieve more accurate and robust AI alignment compared to existing Direct Preference Optimization approaches.

← PrevPage 531 of 842Next →