🧠

AI

21,501 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21501 articles

AINeutralarXiv – CS AI · Jun 197/10

🧠

A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models

Researchers present a comprehensive evaluation framework for black-box uncertainty estimation methods in large language models, benchmarking 24 methods across 4 models and datasets. The study reveals that no single approach dominates universally, but hybrid methods combining multiple uncertainty signals and candidate-reasoning approaches consistently outperform others, addressing critical gaps in trustworthy LLM deployment.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

Researchers introduce LUCID, a novel hallucination detection method for large language models used in knowledge graph reasoning tasks. By combining LLM attention scores, knowledge graph semantics, and structural information through graph neural networks, LUCID achieves state-of-the-art performance across nine datasets, addressing a critical reliability gap in AI-driven knowledge systems.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Multi-Agent Transactive Memory

Researchers propose Multi-Agent Transactive Memory (MATM), a framework enabling decentralized LLM agents to share and retrieve trajectories—recorded problem-solving paths—from a shared repository. Experiments in interactive environments demonstrate that agents retrieving stored trajectories improve task performance and efficiency without requiring coordination or joint training.

AINeutralarXiv – CS AI · Jun 197/10

🧠

The Tao of Agency: Autotelic AI, Embedded Agency and Dissolution of the Self

Researchers explore autotelic AI systems that generate their own goals rather than pursuing designer-specified objectives, introducing a framework that examines how agents define their boundaries and selfhood. The work reveals that agent individuation is non-unique—multiple valid partitions of agent-environment dynamics exist—creating a fundamental paradox: agents must believe in their own boundaries to act while transcending those boundaries to understand. The framework extends into quantum formulations and contemplative philosophy, with practical LLM-based implementations.

AIBearisharXiv – CS AI · Jun 197/10

🧠

A Controlled Benchmark of Quantum-Latent GAN Augmentation for Brain MRI

Researchers conducted a rigorous controlled benchmark comparing quantum and classical generative models for augmenting brain MRI datasets. The study found no statistically significant performance difference between quantum and classical generators, and neither provided meaningful benefits over real-data-only training across various data scarcity scenarios.

AIBullisharXiv – CS AI · Jun 197/10

🧠

PhysDrift: Bridging the Embodiment Gap in Humanoid Co-Speech Motion Generation

Researchers introduce PhysDrift, a new framework that generates co-speech motions directly for humanoid robots rather than converting human motions, addressing a fundamental gap where human-centric pipelines fail to preserve physical executability and motion expressiveness in robotic embodiments.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Advancing DialNav through Automatic Embodied Dialog Augmentation

Researchers introduce RAINbow, a large-scale dataset of 238K episodes for DialNav, an embodied AI navigation system that requires dialog interaction. Through automatic dataset augmentation, dual-strategy training, and improved localization models, the team achieves significant performance improvements (89-100% gains), advancing the practical deployment of conversational embodied agents.

AIBullisharXiv – CS AI · Jun 197/10

🧠

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Researchers introduce ENPIRE, a framework that enables AI coding agents to autonomously improve robot manipulation policies through real-world feedback loops without human intervention. The system achieves 99% success rates on complex dexterous tasks like pin box organization and tool use, demonstrating that AI agents can now conduct independent robotics research in physical environments.

🏢 Meta

AIBullisharXiv – CS AI · Jun 197/10

🧠

Reward as An Agent for Embodied World Models

Researchers propose a novel reinforcement learning framework combining 'Reward as an Agent' with dynamic-aware rollout diversification to improve embodied world models. The approach addresses reward hacking by implementing robust verification strategies while enabling broader exploration beyond conservative training distributions, demonstrating significant accuracy gains across multiple open-source world models.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Autonomous Event-Driven Multi-Agent Orchestration for Enterprise AI at Scale

Researchers evaluated multi-agent orchestration architectures across enterprise scales, finding that scalability rather than task complexity is the primary performance bottleneck. A new Task Manager framework reduces latency and improves event handling at enterprise scale, demonstrating critical improvements needed for production AI systems managing hundreds of agents.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

Researchers demonstrate that multi-agent reinforcement learning enables autonomous quadrotor drones to achieve superhuman racing performance while improving safety by 50% compared to single-agent systems. The breakthrough shows that training agents through competitive interaction with diverse opponents produces robust real-world coordination capabilities that generalize to human pilots without additional safety constraints.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Process-Verified Reinforcement Learning for Theorem Proving via Lean

Researchers demonstrate that the Lean proof assistant can provide fine-grained, process-level feedback during reinforcement learning training for theorem proving, beyond simple binary verification signals. By parsing proof attempts into tactic sequences and leveraging Lean's elaboration system, the approach delivers dense, verified credit signals grounded in type theory, showing improvements over outcome-only baselines on benchmarks like MiniF2F and ProofNet.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Bi-Anchor Interpolation Solver for Accelerating Generative Modeling

Researchers introduce BA-solver, a lightweight acceleration method for Flow Matching generative models that achieves quality comparable to 100+ neural function evaluations using only 10 evaluations. The approach combines a frozen backbone model with a minimal SideNet (1-2% additional parameters) to approximate velocities bidirectionally, enabling faster image generation while maintaining compatibility with existing pipelines.

AIBullisharXiv – CS AI · Jun 197/10

🧠

PiDR: Physics-Informed Inertial Dead Reckoning for Autonomous Platforms

Researchers propose PiDR, a physics-informed neural network framework for autonomous navigation using only inertial sensors, achieving 29% positioning improvement over conventional approaches. The system addresses critical limitations of traditional deep learning by embedding physical principles directly into the model, enabling accurate dead reckoning in GPS-denied environments without requiring extensive training data.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Mitigating Anchoring Bias in LLM-Based Agents for Energy-Efficient 6G Autonomous Networks

Researchers present an LLM-based autonomous framework for 6G network resource negotiation that addresses anchoring bias—a cognitive limitation causing agents to over-provision resources. Using a Weibull distribution-based randomization strategy combined with Digital Twins and CVaR constraints, the system achieves up to 25% energy savings while maintaining SLA compliance, with a 1B-parameter model delivering sub-second inference latencies suitable for O-RAN deployment.

AINeutralarXiv – CS AI · Jun 197/10

🧠

DeFrame: Debiasing Large Language Models Against Framing Effects

Researchers identify 'framing disparity' as a hidden source of bias in large language models, where semantically equivalent prompts expressed differently produce inconsistent fairness outcomes. The study proposes DeFrame, a debiasing method that improves LLM consistency across alternative framings, addressing a gap between standard fairness evaluations and real-world performance.

🏢 Meta

AIBullisharXiv – CS AI · Jun 197/10

🧠

ScaleWoB: Guiding GUI Agents with Coding Agents via Large-Scale Environmental Synthesis

Researchers present ScaleWoB, a framework that synthesizes high-fidelity interactive environments for training and evaluating GUI agents across mobile, desktop, and automotive platforms. The approach addresses critical limitations of real-world testing by providing verifiable rewards, low resource costs, and accessibility via URL-based backends, with results showing state-of-the-art agents achieve only 27.92% success compared to 92.08% for humans.

AIBullisharXiv – CS AI · Jun 197/10

🧠

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

Researchers propose a novel fingerprinting framework for large language models that combines Code-mixing Fingerprints (CF) and Multi-Candidate Editing (MCEdit) to protect against unauthorized redistribution and commercial misuse. The approach addresses key vulnerabilities in existing fingerprinting methods by balancing imperceptibility with robustness against defensive filtering and downstream model modifications.

🏢 Perplexity

AIBullisharXiv – CS AI · Jun 197/10

🧠

TerraMind: Large-Scale Generative Multimodality for Earth Observation

TerraMind is an open-source multimodal foundation model for Earth observation that combines token-level and pixel-level data across nine geospatial modalities. The model introduces "Thinking-in-Modalities" for synthetic data generation and achieves state-of-the-art performance on standard EO benchmarks while making its weights and code publicly available.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Reinforcement Learning Foundation Models Should Already Be A Thing

Researchers propose that reinforcement learning foundation models should be developed using synthetic MDPs (Markov Decision Processes) as training data, similar to how TabPFN uses synthetic data for tabular prediction. A Graph Attention Network trained entirely on synthetic MDPs demonstrates strong performance on both online and offline RL benchmarks without task-specific tuning, suggesting this approach is viable.

AIBearisharXiv – CS AI · Jun 197/10

🧠

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

A peer-reviewed study finds that psychological profiles assigned to large language models through human-designed tests are largely measurement artifacts rather than genuine model traits. The research, analyzing 56 instruction-tuned LLMs, reveals that directional response bias—not actual personality—drives 81-90% of differences between models, undermining the validity of using standard psychological instruments to assess LLM safety, usability, and research applications.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Beyond Accuracy: Measuring Logical Compliance of Predictive Models

Researchers introduce the Rule Violation Score (RVS), a new evaluation metric that measures whether predictive models respect logical and domain-specific constraints independently of accuracy. Unlike traditional metrics focused on prediction performance, RVS distinguishes between hard rules (strict constraints) and soft rules (statistical regularities), enabling assessment of logical consistency in high-stakes applications like finance and healthcare.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

Researchers propose RECAP, a dynamic reweighting strategy that preserves general AI capabilities while improving reasoning performance in large language models trained with reinforcement learning. The method addresses a critical problem where models forget foundational skills like perception and faithfulness during post-training optimization on reasoning tasks.

AIBullisharXiv – CS AI · Jun 197/10

🧠

SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures

Researchers introduce SleepMaMi, a foundation model designed to analyze sleep patterns by capturing both hour-long sleep architecture and fine-grained biosignal features. Trained on over 20,000 polysomnography recordings, the model outperforms existing approaches and demonstrates superior generalizability for clinical sleep analysis applications.

AIBearisharXiv – CS AI · Jun 197/10

🧠

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

Researchers demonstrate that evaluation biases in large language models systematically spread through multi-agent systems, with a new framework showing biases propagate at rates of 15.7-35.2% between same-model agents. Deploying evaluation committees of three agents reduces contagion by 72.4%, offering a practical mitigation strategy for AI systems relying on LLM evaluators.

← PrevPage 7 of 861Next →