🧠

AI

22,940 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

22940 articles

AIBearisharXiv – CS AI · Jun 257/10

🧠

Internal Data Repetition Destroys Language Models

Researchers demonstrate that data repetition in language model training systematically degrades performance, with peak damage occurring at moderate repetition levels rather than following linear degradation. Using modern scaling laws, they quantify that repeated data consuming just 10% of training compute can waste up to 67% of computational resources, revealing a critical inefficiency in how AI models are currently trained.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

Researchers discovered that language models can detect undesirable behaviors like hallucination with near-perfect accuracy, yet the neural directions enabling detection are nearly orthogonal (83 degrees apart) from those controlling the behavior. This fundamental geometric dissociation between knowing and steering persists across multiple models and scales, challenging a core assumption of mechanistic interpretability that detection should enable control.

AIBullisharXiv – CS AI · Jun 257/10

🧠

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs

Researchers introduce SPARC, a modular framework that decouples visual perception from reasoning in vision-language models to improve test-time scaling efficiency. By separating tasks into explicit visual search and conditional reasoning stages, SPARC achieves significant performance gains on visual reasoning benchmarks while reducing computational token requirements by up to 200×.

AIBullisharXiv – CS AI · Jun 257/10

🧠

CauScale: Neural Causal Discovery at Scale

CauScale is a neural architecture that dramatically advances causal discovery—a critical capability for scientific AI and data analysis—by enabling efficient processing of graphs with up to 1,000 nodes. The system achieves 99.6% accuracy on standard benchmarks while delivering 4-13,000x faster inference than existing methods, solving long-standing computational bottlenecks that previously limited causal discovery to smaller datasets.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding

Researchers introduce Streaming-dLLM, a training-free optimization framework that accelerates Diffusion Language Models by up to 68.2X through spatial suffix pruning and dynamic temporal decoding strategies. The approach maintains generation quality while addressing inherent inefficiencies in block-wise diffusion processes, representing a significant advance in making parallel decoding models more computationally practical.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Rational Neural Networks have Expressivity Advantages

Researchers demonstrate that neural networks using trainable rational activation functions achieve exponentially better parameter efficiency and expressivity compared to standard activations like ReLU, Sigmoid, and Tanh. The findings show rational activations require only polylogarithmic overhead to approximate fixed-activation networks, while the reverse requires logarithmic parameters—a theoretical advantage that translates to practical performance gains.

AIBearisharXiv – CS AI · Jun 257/10

🧠

A Marketplace for AI-Generated Adult Content and Deepfakes

A longitudinal study of Civitai's monetized bounty marketplace reveals that the majority of AI-generated content commissions involve explicit material, with deepfakes of real individuals—disproportionately targeting female celebrities—comprising a significant portion despite platform policies. The findings expose governance and enforcement failures in community-driven generative AI platforms that monetize content creation.

AIBullisharXiv – CS AI · Jun 257/10

🧠

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

A comprehensive practitioner's reference guide on agentic AI systems has been announced, covering the complete stack from LLM foundations through production deployment. The work systematizes knowledge across transformer architecture, alignment techniques, retrieval systems, multi-agent coordination, and deployment frameworks—establishing agentic AI as a mature field requiring integrated understanding across all technical layers.

AIBullisharXiv – CS AI · Jun 257/10

🧠

ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning

Researchers introduce ACT-JEPA, a machine learning architecture that combines imitation learning with self-supervised learning to improve policy representation in AI decision-making systems. The model achieves up to 40% improvement in world model understanding and 10% higher task success rates by jointly predicting action and latent observation sequences in latent space rather than raw input.

AIBullisharXiv – CS AI · Jun 257/10

🧠

OmegAMP: Targeted AMP Discovery via Biologically Informed Generation

OmegAMP is a deep learning framework that uses diffusion-based generation with biologically informed encoding to design antimicrobial peptides (AMPs) with unprecedented controllability and precision. In wet lab validation, 24 of 25 candidate peptides (96%) demonstrated antimicrobial activity, including against multi-drug resistant strains, potentially accelerating drug discovery for antibiotic-resistant infections.

AINeutralarXiv – CS AI · Jun 257/10

🧠

PVF:Understanding AI Vulnerability Against SDCs

Researchers have developed Parameter Vulnerability Factor (PVF), a quantitative metric to measure how susceptible AI model parameters are to silent data corruptions (SDCs) caused by hardware faults. The framework addresses critical reliability concerns in AI deployment by standardizing vulnerability assessment across different model architectures and has been adopted by Meta in designing their MTIA AI chip.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Position: Reasoning After Perception Means Reasoning Without Vision

Researchers challenge the assumption that language reasoning can compensate for vision-language model weaknesses, arguing that deferring visual reasoning to text collapses spatial information and degrades perception to passive encoding. The study introduces the Turing Eye Test to demonstrate tasks requiring visual reasoning in pixel space cannot be solved through text-only reasoning alone, suggesting AI architectures must shift toward reasoning within perception rather than about it.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Skill-MAS introduces a novel framework that enhances multi-agent AI systems by evolving meta-skills through a closed optimization loop, achieving significant performance gains while maintaining cost efficiency across diverse LLMs and tasks.

AIBullisharXiv – CS AI · Jun 257/10

🧠

LLM Performance on a Real, Double-Marked GCSE Benchmark

Researchers tested large language models against human examiners on 32,534 real UK GCSE exam responses, finding that top-performing models achieve higher agreement with examiner consensus than examiners do with each other. The results demonstrate LLMs can reliably grade subjective tasks like essays and handle complex handwritten work, suggesting viable automated marking solutions.

AIBearisharXiv – CS AI · Jun 257/10

🧠

TriViewBench: Controlled Complexity Scaling for Multi-View Structural Reasoning in MLLMs

Researchers introduce TriViewBench, a controlled benchmark for evaluating multimodal AI models' ability to reason across multiple 3D views with varying complexity. Testing 18 MLLMs reveals a universal capability hierarchy and severe performance degradation on complex tasks, particularly in cross-view spatial reasoning, suggesting fundamental limitations in current AI architecture.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Researchers discovered that language models forget learned rules midway through training despite continued evidence in data—a phenomenon called 'natural ungrokking.' The survival of rules depends predictably on how often they appear in training data, and attempts to restore forgotten rules through data manipulation fail despite successfully destroying them, revealing asymmetric control over model knowledge.

AIBullisharXiv – CS AI · Jun 257/10

🧠

To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG

Researchers demonstrate that multi-agent document assessment for retrieval-augmented generation (RAG) systems can be significantly optimized through model-adaptive routing rather than expensive scoring mechanisms. The study reveals that weaker models benefit primarily from document isolation rather than quality assessment, while MADARA, a proposed adaptive architecture, generalizes across different model families with zero-shot capability, reducing computational overhead.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Heuresis: Search Strategies for Autonomous AI Research Agents Across Quality, Diversity and Novelty

Researchers introduce Heuresis, a framework for autonomous AI research agents that tests six search strategies across quality, diversity, and novelty dimensions. The study reveals that truly novel AI research ideas are exceptionally rare, with no ideas rated as "Original" and novel approaches consistently underperforming established methods—suggesting a fundamental gap between algorithmic exploration and meaningful scientific breakthroughs.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Researchers demonstrate that reinforcement learning post-training for large language models can generate effective step-level reward signals without dedicated reward model training. The 'progress advantage' metric—derived from log-probability ratios between trained and reference policies—eliminates annotation overhead while matching or exceeding performance of purpose-built reward models across multiple applications.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Weave of Formal Thought

Researchers introduce Weave of Formal Thought (WoFT), a framework that combines rigorous syntactic validation with learned structural representations to improve code generation in large language models. The approach uses constrained decoding with full Tree-sitter compliance and fine-tuning methods that teach models to embed grammar symbols during generation, achieving 14.3% relative cross-entropy reduction on Python code.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Privacy Vulnerabilities of Attention Layers in Tabular Foundation Models and Protection of High-Risk Queries

Researchers demonstrate that transformer-based tabular foundation models leak sensitive information through their attention mechanisms, enabling effective membership inference attacks despite being pre-trained on synthetic data. The study proposes both an attack method (AMIA) and a defense strategy inspired by k-anonymity that reduces privacy leakage by 50% while maintaining model performance.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Long-Term Simulation Exposes Cognitive-Developmental Risks in AI Companions

Researchers propose TSJ, a longitudinal evaluation framework that tests AI companions for developmental risks in children and adolescents through simulated long-term interactions. The study reveals that standard short-session safety tests significantly underestimate risks, with stable risk detection requiring at least 140 interaction turns across multiple developmental stages and vulnerability profiles.

AIBullisharXiv – CS AI · Jun 257/10

🧠

AutoRelAnnotator: Calibrated Model Cascades for Cost-Efficient Relevance Evaluation in Sponsored Search

Researchers introduced AutoRelAnnotator, a calibrated model cascade system that generates high-quality relevance annotations for search ranking systems at significantly lower cost than human labeling. The approach combines domain-specific fine-tuning, progressive model cascading, and isotonic calibration to achieve production-grade accuracy while reducing compute costs by approximately 50%, with validation across 150M+ annotations in real-world search and advertising systems.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Enhancing Brain MRI Anomaly Detection and Reasoning with ROI Rethink and Synthetic Data

Researchers introduce BrReMark, a framework that enhances brain MRI diagnosis by requiring AI models to explicitly mark and verify abnormal regions before reaching conclusions. The approach dramatically improves diagnostic accuracy and reduces false positives by 45.7% on out-of-distribution data, addressing critical trust and hallucination issues in medical AI systems.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models

Researchers demonstrate that low-bit quantization of reasoning models introduces a hidden cost: quantized models generate significantly longer chains of thought to maintain accuracy, offsetting per-token speedup gains. The study introduces metrics to measure this token inflation and finds quantization-aware training as the most effective mitigation strategy.

← PrevPage 9 of 918Next →