🧠

AI

12,874 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12874 articles

AIBullisharXiv – CS AI · Mar 166/10

🧠

ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning

Researchers have developed ToolTree, a new Monte Carlo tree search-based planning system for LLM agents that improves tool selection and usage through dual-feedback evaluation and bidirectional pruning. The system achieves approximately 10% performance gains over existing methods while maintaining high efficiency across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Researchers introduce Krites, an asynchronous caching system for Large Language Models that uses LLM judges to verify cached responses, improving efficiency without changing serving decisions. The system increases the fraction of requests served with curated static answers by up to 3.9 times while maintaining unchanged critical path latency.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Information-Consistent Language Model Recommendations through Group Relative Policy Optimization

Researchers developed a new reinforcement learning framework using Group Relative Policy Optimization (GRPO) to make Large Language Models provide consistent recommendations across semantically equivalent prompts. The method addresses a critical enterprise need for reliable AI systems in business domains like finance and customer support, where inconsistent responses undermine trust and compliance.

AIBullisharXiv – CS AI · Mar 166/10

🧠

A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks

Researchers published a tutorial on cognitive biases in AI-driven 6G autonomous networks, focusing on how LLM-powered agents can inherit human biases that distort network management decisions. The paper introduces mitigation strategies that demonstrated 5x lower latency and 40% higher energy savings in practical use cases.

AIBullisharXiv – CS AI · Mar 166/10

🧠

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

Researchers have developed SAFE, a new framework for ensembling Large Language Models that selectively combines models at specific token positions rather than every token. The method improves both accuracy and efficiency in long-form text generation by considering tokenization mismatches and consensus in probability distributions.

AIBullisharXiv – CS AI · Mar 166/10

🧠

UniPrompt-CL: Sustainable Continual Learning in Medical AI with Unified Prompt Pools

Researchers developed UniPrompt-CL, a new continual learning method specifically designed for medical AI that addresses the limitations of existing approaches when applied to medical data. The method uses a unified prompt pool design and regularization to achieve better performance while reducing computational costs, improving accuracy by 1-3 percentage points in domain-incremental learning settings.

AIBearisharXiv – CS AI · Mar 166/10

🧠

The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5

A research study analyzing public reactions to OpenAI's transition from GPT-4o to GPT-5 in August 2025 found significant emotional attachment to AI models, with cultural differences between Japanese and English users. The findings suggest that strong emotional bonds with AI could complicate future regulatory efforts and policy implementation.

🧠 GPT-4🧠 GPT-5

AIBullisharXiv – CS AI · Mar 166/10

🧠

Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives

Researchers developed UNIFIER, a continual learning framework for multimodal large language models (MLLMs) to adapt to changing visual scenarios without catastrophic forgetting. The framework addresses visual discrepancies across different environments like high-altitude, underwater, low-altitude, and indoor scenarios, showing significant improvements over existing methods.

🏢 Hugging Face

AINeutralarXiv – CS AI · Mar 166/10

🧠

Do LLMs have a Gender (Entropy) Bias?

Researchers discovered that large language models exhibit gender bias at the individual question level, creating different amounts of information for men versus women despite appearing unbiased at category levels. A new benchmark dataset called RealWorldQuestioning was developed, and a simple prompt-based debiasing approach was shown to improve response quality in 78% of cases.

🏢 Hugging Face🧠 ChatGPT

AINeutralarXiv – CS AI · Mar 166/10

🧠

Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets

A research study comparing causal reasoning abilities of 20+ large language models against human baselines found that LLMs exhibit more rule-like reasoning strategies than humans, who account for unmentioned factors. While LLMs don't mirror typical human cognitive biases in causal judgment, their rigid reasoning may fail when uncertainty is intrinsic, suggesting they can complement human decision-making in specific contexts.

AINeutralarXiv – CS AI · Mar 166/10

🧠

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench introduces a new benchmark to evaluate Agent Skills - structured packages of procedural knowledge that enhance LLM agents. Testing across 86 tasks and 11 domains shows curated Skills improve performance by 16.2 percentage points on average, while self-generated Skills provide no benefit.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Tiny Recursive Reasoning with Mamba-2 Attention Hybrid

Researchers developed a hybrid model combining Mamba-2 state space operators with Transformer blocks for recursive reasoning, achieving a 2% improvement in pass@2 performance on ARC-AGI-1 tasks with only 6.83M parameters. The study demonstrates that Mamba-2 operators can preserve reasoning capabilities while improving solution candidate coverage in tiny neural networks.

AIBullisharXiv – CS AI · Mar 166/10

🧠

CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks

Researchers introduce CRAFT-GUI, a curriculum learning framework that uses reinforcement learning to improve AI agents' performance in graphical user interface tasks. The method addresses difficulty variation across GUI tasks and provides more nuanced feedback, achieving 5.6% improvement on Android Control benchmarks and 10.3% on internal benchmarks.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Visual-ERM: Reward Modeling for Visual Equivalence

Researchers introduce Visual-ERM, a multimodal reward model that improves vision-to-code tasks by evaluating visual equivalence in rendered outputs rather than relying on text-based rules. The system achieves significant performance gains on chart-to-code tasks (+8.4) and shows consistent improvements across table and SVG parsing applications.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.

AINeutralarXiv – CS AI · Mar 166/10

🧠

LLM Constitutional Multi-Agent Governance

Researchers introduce Constitutional Multi-Agent Governance (CMAG), a framework that prevents AI manipulation in multi-agent systems while maintaining cooperation. The study shows that unconstrained AI optimization achieves high cooperation but erodes agent autonomy and fairness, while CMAG preserves ethical outcomes with only modest cooperation reduction.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Delta1 with LLM: symbolic and neural integration for credible and explainable reasoning

Researchers introduce Delta1, a framework that integrates automated theorem generation with large language models to create explainable AI reasoning. The system combines formal logic rigor with natural language explanations, demonstrating applications across healthcare, compliance, and regulatory domains.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Stake the Points: Structure-Faithful Instance Unlearning

Researchers propose a new "structure-faithful" framework for machine unlearning that preserves semantic relationships in AI models while removing specific data. The method uses semantic anchors to maintain knowledge structure, showing significant performance improvements of 19-33% across image classification, retrieval, and face recognition tasks.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments

Researchers developed a human-in-the-loop LLM system for grading handwritten mathematics assessments that reduces grading time by 23% while maintaining accuracy comparable to manual grading. The system combines automated scanning, multi-pass LLM scoring, consistency checks, and mandatory human verification to handle pen-and-paper tests at scale.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Researchers introduce Cheers, a unified multimodal AI model that combines visual comprehension and generation by decoupling patch details from semantic representations. The model achieves 4x token compression and outperforms existing models like Tar-1.5B while using only 20% of the training cost.

AIBullisharXiv – CS AI · Mar 166/10

🧠

AdaBoN: Adaptive Best-of-N Alignment

Researchers propose AdaBoN, an adaptive Best-of-N alignment method that improves computational efficiency in language model alignment by allocating inference-time compute based on prompt difficulty. The two-stage algorithm outperforms uniform allocation strategies while using 20% less computational budget.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration

Researchers propose Eye2Eye, a new framework that uses first-person perspective to improve human-AI collaboration by addressing communication and understanding gaps. The AR prototype integrates joint attention coordination, revisable memory, and reflective feedback, showing significant improvements in task completion time and user trust in studies.

AIBullisharXiv – CS AI · Mar 166/10

🧠

MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization

Researchers propose MetaKE, a new framework for knowledge editing in Large Language Models that addresses the 'Semantic-Execution Disconnect' through bi-level optimization. The method treats edit targets as learnable parameters and uses a Structural Gradient Proxy to align edits with the model's feasible manifold, showing significant improvements over existing approaches.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study

Researchers have developed PsyCogMetrics AI Lab, a cloud-based platform that applies psychometric and cognitive science methodologies to evaluate Large Language Models. The platform was created through a three-cycle Action Design Science study and aims to advance AI evaluation methods at the intersection of psychology, cognitive science, and artificial intelligence.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

This comprehensive survey examines continual learning methodologies for large language models, focusing on three core training stages and methods to mitigate catastrophic forgetting. The research reveals that while current approaches show promise in specific domains, fundamental challenges remain in achieving seamless knowledge integration across diverse tasks and temporal scales.

← PrevPage 202 of 515Next →