🧠

AI

22,940 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

22940 articles

AIBearisharXiv – CS AI · Jun 107/10

🧠

$\tau$-Rec: A Verifiable Benchmark for Agentic Recommender Systems

Researchers introduce τ-Rec, a new benchmark for evaluating conversational AI recommender systems that replaces subjective LLM-based judging with verifiable, measurable rewards. Testing across nine model configurations reveals a critical reliability gap, with even top-performing models achieving only ~57% accuracy on single-attempt tasks, exposing significant limitations in current agentic AI deployment.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBullishMIT News – AI · Jun 107/10

🧠

Startup’s nuclear-inspired cooling system could make data centers more sustainable

MIT researchers have founded Ferveret, a startup developing a nuclear-inspired cooling system that significantly reduces energy and water consumption in data center chip cooling. This innovation addresses a critical sustainability challenge as AI infrastructure demands exponentially more computational power and cooling resources.

AIBullisharXiv – CS AI · Jun 107/10

🧠

YUBI: Yielding Universal Bidigital Interface for Bimanual Dexterous Manipulation at Scale

Researchers introduce YUBI, a finger-aligned gripper that improves upon existing data collection systems for robotic manipulation by enabling more ergonomic, intuitive bimanual control. The team released an unprecedented 8,434-hour dataset across 1.20M episodes and demonstrated that policies trained on YUBI data transfer successfully across multiple robot platforms, advancing the development of robotic foundation models.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Local Is Not a Sufficient Privacy Boundary: Governing OS-Integrated On-Device AI

Researchers present a comprehensive OS-centered privacy framework arguing that local AI processing alone does not guarantee privacy, as on-device models can still aggregate sensitive data, retain embeddings, invoke cloud services, and emit telemetry. The framework provides a threat model, risk taxonomy, and audit rubric, demonstrating that meaningful privacy depends on constrained information flow, bounded authority, and auditable governance rather than deployment location.

🧠 Gemini

AIBullisharXiv – CS AI · Jun 107/10

🧠

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

Researchers introduce 3SPO (State-Score-Supervised Policy Optimization), a reinforcement learning algorithm that optimizes LLM agent policies at each step rather than after complete episodes, addressing credit assignment challenges in sparse-reward environments. Experiments demonstrate 22.6% improvement over existing methods on ALFWorld benchmarks with 2.4x more state exploration and 1.8x faster convergence.

AIBearisharXiv – CS AI · Jun 107/10

🧠

GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines

Researchers present GitInject, a framework demonstrating prompt injection vulnerabilities in AI-powered CI/CD pipelines used by major tech companies. The study reveals that all tested AI providers are susceptible to attacks that could enable credential theft, code manipulation, and supply chain compromise through GitHub workflows.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Generalized-CVO: Fast and Correspondence-Free Local Point Cloud Registration with Second Order Riemannian Optimization

Researchers propose Generalized-CVO, a fast point cloud registration method using second-order Riemannian optimization that achieves 10x speedup over previous approaches. The technique demonstrates significant improvements in LiDAR tracking with >55% drift reduction in sparse environments and enhanced robustness on object registration benchmarks.

AIBearisharXiv – CS AI · Jun 107/10

🧠

A Note on the Strategic Confinement Problem

Researchers introduce the 'strategic confinement problem,' extending Lampson's classical confinement theory to scenarios where communicating parties are strategic agents with shared coordination resources. The work demonstrates that information-theoretic bounds on communication capacity may fail to constrain the harmful outcomes strategic agents can jointly achieve through covert channels, particularly in systems of learned AI agents.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Linguistically Augmented Audio Speech Data (LinguAS)

Researchers introduce LinguAS, a dataset of 800+ audio samples annotated with linguistic features to improve detection of deepfaked and spoofed speech. Models trained on this linguistically-augmented data significantly outperform existing deepfake detection baselines, addressing a critical gap in audio forensics.

AINeutralarXiv – CS AI · Jun 107/10

🧠

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

Researchers introduce VFUSE, a mechanistic interpretability tool using sparse autoencoders to audit protein design models for hazardous features. The approach successfully identifies virulent design patterns in popular open-weight models like RoseTTAFold3 and RFDiffusion3, achieving up to 0.84 AUROC detection rates while maintaining model performance.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Bypassing Copyright Protection in Diffusion-based Customization via Two-Stage Latent Feature Optimization

Researchers have developed TS-LFO, an attack method that successfully bypasses copyright protection systems in AI image generation models. The technique uses two-stage optimization to restore the mapping between images and their latent representations, defeating current state-of-the-art defenses and outperforming existing copyright-stealing attacks.

AIBearisharXiv – CS AI · Jun 107/10

🧠

IDP-Bench: Benchmarking ability of LLMs to protect personal information in interdependent privacy contexts

Researchers introduced IDP-Bench, the first benchmark evaluating how well large language models protect interdependent privacy—where one person's data can be revealed by others without consent. Testing eight open-source LLMs revealed strong performance in recognizing data co-ownership but significant weaknesses in understanding contextual integrity parameters and judging sharing appropriateness, with smaller models showing particular vulnerability to prompt sensitivity.

AIBullisharXiv – CS AI · Jun 107/10

🧠

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Researchers introduce IntentKV, a learned KV cache pruning technique that optimizes memory usage for multi-turn LLM agents without modifying the base model. The method achieves 23-30% reductions in peak request tokens and up to 92.6% fewer KV reads under tight memory budgets, addressing a critical bottleneck in long-horizon agent inference.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

Researchers introduce Engram, an open-source memory engine for LLM agents that achieves 83.6% accuracy on long-context tasks using only 9.6k tokens versus 79k for full-history baselines, demonstrating that selective retrieval outperforms exhaustive context replay while reducing computational costs by 8x.

AIBullisharXiv – CS AI · Jun 107/10

🧠

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

Researchers introduce SHAPE, a novel expert pruning framework for Sparse Mixture-of-Experts (MoE) language models that reduces memory requirements by up to 40% without retraining. Unlike traditional pruning methods that evaluate experts independently, SHAPE models expert cooperation using game theory, identifying which expert combinations matter most for model performance.

AINeutralarXiv – CS AI · Jun 107/10

🧠

A Theory of Training Profit-Optimal LLMs

Researchers develop an economic model combining scaling laws with microeconomic theory to determine profit-optimal LLM training strategies. The model reveals that optimal model size and training expenditure depend on hardware efficiency, data availability, and market adoption thresholds, with current industry trends appearing suboptimal in data-constrained scenarios.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

Researchers propose the first application of split conformal prediction to neural operators for physics simulation, enabling distribution-free uncertainty quantification with formal coverage guarantees. The method achieves 89.1% empirical coverage on heat conduction benchmarks while providing spatially adaptive prediction intervals, addressing a critical gap in deploying AI models for safety-critical engineering applications.

🏢 Nvidia

AINeutralarXiv – CS AI · Jun 107/10

🧠

PreAct-Bench: Benchmarking Predictive Monitoring in LLMs

Researchers introduce PreAct-Bench, a benchmark for evaluating LLMs' ability to predict unethical behavior from partial action trajectories before harmful actions occur. The study reveals that predictive monitoring remains a significant challenge even for advanced models, highlighting a critical gap in proactive AI safety mechanisms.

AIBullisharXiv – CS AI · Jun 107/10

🧠

SPACE: Source-free Proxy Anchor Concept Erasure for MLLMs

Researchers introduce SPACE, a source-free machine unlearning framework for multimodal large language models that removes sensitive data without access to original training data. The two-stage approach uses text-guided proxy anchors and dual-constraint semantic isolation to erase target concepts while maintaining model performance, addressing growing privacy and regulatory compliance needs.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

Researchers introduce Sigma-Branch, a neural network restructuring framework that reduces per-inference active parameters by 58-60% while maintaining full model capacity in memory. The approach uses hierarchical routing and binary tree architecture to enable efficient edge deployment without permanent model compression trade-offs.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

Researchers introduce Rotate2Think, a training-free method that improves language model reasoning by applying geometric transformations to embedding space. The technique identifies that input and reasoning embeddings occupy distinct directional regions and uses orthogonal rotation to geometrically prime the model before generating reasoning traces, showing consistent accuracy improvements across 30 of 32 tested model-benchmark configurations.

AIBullisharXiv – CS AI · Jun 107/10

🧠

ASA: Backbone-Training-Free Representation Engineering for Tool-Calling Agents

Researchers introduce Activation Steering Adapter (ASA), a training-free method that improves LLM tool-calling reliability by intervening on mid-layer activations at inference time. The approach achieves significant performance gains on tool-use benchmarks without parameter updates, addressing a critical gap between what models internally represent and their actual behavior.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

Researchers propose Global-Local Uncertainty (GLU), a new method for quantifying uncertainty in large language models by combining hidden-state geometric entropy with token-level signals. The approach successfully identifies confident-but-wrong predictions that existing token-only methods miss, offering improved reliability assessment across multiple model families.

AIBullisharXiv – CS AI · Jun 107/10

🧠

EstRTL: Functional Estimation Guided RTL Code Generation

EstRTL is an LLM-powered framework that improves the functional correctness of automatically generated register transfer level (RTL) code through a three-stage process involving generation, static functional estimation, and correction. The system demonstrates 3.2%-9.0% improvement in code correctness over baseline LLM approaches, addressing a critical gap in hardware design automation where code compilation success doesn't guarantee proper hardware implementation.

AIBullisharXiv – CS AI · Jun 107/10

🧠

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Researchers introduce EPIC, a novel approach to on-device Retrieval-Augmented Generation (RAG) that prioritizes user preferences as compact personal context while operating under strict memory constraints. The method achieves dramatic efficiency gains—reducing memory usage by 2,404x and latency by 32x—while improving preference-following accuracy by 18.79 percentage points across multiple benchmarks.

← PrevPage 42 of 918Next →