🧠

AI

22,879 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

22879 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding

Researchers introduce Streaming-dLLM, a training-free optimization framework that accelerates Diffusion Language Models by up to 68.2X through spatial suffix pruning and dynamic temporal decoding strategies. The approach maintains generation quality while addressing inherent inefficiencies in block-wise diffusion processes, representing a significant advance in making parallel decoding models more computationally practical.

AIBullisharXiv – CS AI · Jun 257/10

🧠

CauScale: Neural Causal Discovery at Scale

CauScale is a neural architecture that dramatically advances causal discovery—a critical capability for scientific AI and data analysis—by enabling efficient processing of graphs with up to 1,000 nodes. The system achieves 99.6% accuracy on standard benchmarks while delivering 4-13,000x faster inference than existing methods, solving long-standing computational bottlenecks that previously limited causal discovery to smaller datasets.

AIBearisharXiv – CS AI · Jun 257/10

🧠

A Marketplace for AI-Generated Adult Content and Deepfakes

A longitudinal study of Civitai's monetized bounty marketplace reveals that the majority of AI-generated content commissions involve explicit material, with deepfakes of real individuals—disproportionately targeting female celebrities—comprising a significant portion despite platform policies. The findings expose governance and enforcement failures in community-driven generative AI platforms that monetize content creation.

AIBullisharXiv – CS AI · Jun 257/10

🧠

MacroLens: A Multi-Task Benchmark for Contextual Financial Reasoning under Macroeconomic Scenarios

MacroLens is a new financial reasoning benchmark that combines price history, accounting fundamentals, macroeconomic data, and news text to evaluate AI models on seven financial tasks across 4,416 U.S. small- and micro-cap stocks. The dataset addresses critical evaluation challenges unique to finance and tests 19 methods ranging from heuristics to frontier LLMs, providing a standardized tool for developing contextual financial AI systems.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 257/10

🧠

Rational Neural Networks have Expressivity Advantages

Researchers demonstrate that neural networks using trainable rational activation functions achieve exponentially better parameter efficiency and expressivity compared to standard activations like ReLU, Sigmoid, and Tanh. The findings show rational activations require only polylogarithmic overhead to approximate fixed-activation networks, while the reverse requires logarithmic parameters—a theoretical advantage that translates to practical performance gains.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

Researchers discovered that language models can detect undesirable behaviors like hallucination with near-perfect accuracy, yet the neural directions enabling detection are nearly orthogonal (83 degrees apart) from those controlling the behavior. This fundamental geometric dissociation between knowing and steering persists across multiple models and scales, challenging a core assumption of mechanistic interpretability that detection should enable control.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Learning Non-Vacuous Generalization Bounds from Optimization

Researchers have developed a non-vacuous generalization bound for deep neural networks by analyzing stochastic gradient descent through the lens of fractional Brownian motion, demonstrating theoretical guarantees on networks like ResNet and Vision Transformer trained on ImageNet-1K. This addresses a long-standing gap between theoretical bounds and practical neural network performance.

AINeutralarXiv – CS AI · Jun 257/10

🧠

PVF:Understanding AI Vulnerability Against SDCs

Researchers have developed Parameter Vulnerability Factor (PVF), a quantitative metric to measure how susceptible AI model parameters are to silent data corruptions (SDCs) caused by hardware faults. The framework addresses critical reliability concerns in AI deployment by standardizing vulnerability assessment across different model architectures and has been adopted by Meta in designing their MTIA AI chip.

AIBullisharXiv – CS AI · Jun 257/10

🧠

ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning

Researchers introduce ACT-JEPA, a machine learning architecture that combines imitation learning with self-supervised learning to improve policy representation in AI decision-making systems. The model achieves up to 40% improvement in world model understanding and 10% higher task success rates by jointly predicting action and latent observation sequences in latent space rather than raw input.

AIBullisharXiv – CS AI · Jun 257/10

🧠

LLM Performance on a Real, Double-Marked GCSE Benchmark

Researchers tested large language models against human examiners on 32,534 real UK GCSE exam responses, finding that top-performing models achieve higher agreement with examiner consensus than examiners do with each other. The results demonstrate LLMs can reliably grade subjective tasks like essays and handle complex handwritten work, suggesting viable automated marking solutions.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

Researchers demonstrate that reinforcement learning post-training for large language models can generate effective step-level reward signals without dedicated reward model training. The 'progress advantage' metric—derived from log-probability ratios between trained and reference policies—eliminates annotation overhead while matching or exceeding performance of purpose-built reward models across multiple applications.

AIBullisharXiv – CS AI · Jun 257/10

🧠

OmegAMP: Targeted AMP Discovery via Biologically Informed Generation

OmegAMP is a deep learning framework that uses diffusion-based generation with biologically informed encoding to design antimicrobial peptides (AMPs) with unprecedented controllability and precision. In wet lab validation, 24 of 25 candidate peptides (96%) demonstrated antimicrobial activity, including against multi-drug resistant strains, potentially accelerating drug discovery for antibiotic-resistant infections.

AIBullisharXiv – CS AI · Jun 257/10

🧠

The 4/$\delta$ Bound: Designing Predictable LLM-Verifier Systems for Formal Method Guarantee

Researchers have developed the first formal convergence theorem for LLM-Verifier systems, proving that multi-stage software verification pipelines will reach completion with guaranteed termination. The 4/δ bound provides a precise latency prediction model validated across 90,000+ empirical trials, replacing heuristic approaches with mathematically rigorous resource planning for safety-critical applications.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

Researchers introduce Xcientist, a research harness that makes AI scientific reasoning transparent and auditable by externalizing research synthesis into inspectable artifacts. The system addresses 'claim drift'—where AI-generated mechanisms lose evidential grounding—and demonstrates traceable workflows across three scientific domains, suggesting AI scientists should be evaluated on accountability and reproducibility, not just output.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Position: Reasoning After Perception Means Reasoning Without Vision

Researchers challenge the assumption that language reasoning can compensate for vision-language model weaknesses, arguing that deferring visual reasoning to text collapses spatial information and degrades perception to passive encoding. The study introduces the Turing Eye Test to demonstrate tasks requiring visual reasoning in pixel space cannot be solved through text-only reasoning alone, suggesting AI architectures must shift toward reasoning within perception rather than about it.

AIBearisharXiv – CS AI · Jun 257/10

🧠

TriViewBench: Controlled Complexity Scaling for Multi-View Structural Reasoning in MLLMs

Researchers introduce TriViewBench, a controlled benchmark for evaluating multimodal AI models' ability to reason across multiple 3D views with varying complexity. Testing 18 MLLMs reveals a universal capability hierarchy and severe performance degradation on complex tasks, particularly in cross-view spatial reasoning, suggesting fundamental limitations in current AI architecture.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Internal Data Repetition Destroys Language Models

Researchers demonstrate that data repetition in language model training systematically degrades performance, with peak damage occurring at moderate repetition levels rather than following linear degradation. Using modern scaling laws, they quantify that repeated data consuming just 10% of training compute can waste up to 67% of computational resources, revealing a critical inefficiency in how AI models are currently trained.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Erased, but Not Gone: Output Forgetting Is Not True Forgetting

Researchers demonstrate that machine unlearning methods that appear successful at the output layer—the standard evaluation metric—actually retain structured residual information in representation space compared to true retraining. This finding reveals a critical gap between apparent forgetting and genuine forgetting, suggesting current unlearning evaluations systematically overestimate effectiveness.

AINeutralarXiv – CS AI · Jun 257/10

🧠

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Researchers discovered that language models forget learned rules midway through training despite continued evidence in data—a phenomenon called 'natural ungrokking.' The survival of rules depends predictably on how often they appear in training data, and attempts to restore forgotten rules through data manipulation fail despite successfully destroying them, revealing asymmetric control over model knowledge.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Do Thinking Tokens Help with Safety?

Researchers found that thinking tokens in advanced reasoning models do not improve safety as widely believed. The model's refusal or compliance decision is determined within the first token's representation before visible thinking occurs, suggesting safety behavior is largely predetermined rather than genuinely deliberative.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Yuvion VL: A Multimodal Foundation Model for Adversarial Content and AI Safety

Researchers introduce Yuvion VL, a multimodal AI foundation model specifically engineered to detect and understand adversarial content and safety risks across images and text. The model achieves industry-leading safety performance while maintaining general capabilities, addressing a critical gap in AI systems' ability to handle real-world multimodal threats.

AIBullisharXiv – CS AI · Jun 257/10

🧠

AutoRelAnnotator: Calibrated Model Cascades for Cost-Efficient Relevance Evaluation in Sponsored Search

Researchers introduced AutoRelAnnotator, a calibrated model cascade system that generates high-quality relevance annotations for search ranking systems at significantly lower cost than human labeling. The approach combines domain-specific fine-tuning, progressive model cascading, and isotonic calibration to achieve production-grade accuracy while reducing compute costs by approximately 50%, with validation across 150M+ annotations in real-world search and advertising systems.

AIBullisharXiv – CS AI · Jun 257/10

🧠

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Researchers introduce Wan-Streamer, a unified foundation model that handles real-time audio-visual interaction through a single Transformer architecture, eliminating the need for separate modules and achieving approximately 200ms model-side latency. The system enables sub-second duplex communication by integrating perception, reasoning, generation, and response timing within one end-to-end model.

AIBearisharXiv – CS AI · Jun 257/10

🧠

What Does It Mean to Break a Distillation Defense?

Researchers propose a formal threat model framework for evaluating distillation defenses against black-box LLM attacks, arguing that existing output perturbation defenses lack clear specifications about attacker capabilities. The work demonstrates that defense effectiveness depends heavily on assumed threat parameters, raising concerns about false security claims in deployed systems.

AIBearisharXiv – CS AI · Jun 257/10

🧠

Color Matters: Trigger Color Affects Success in Federated Backdoor Attacks

Researchers demonstrate that trigger color significantly affects the success of backdoor attacks in federated learning systems, with white triggers more effective against blonde-class targets and black triggers more effective against black-class targets. This finding reveals a previously underexplored vulnerability in distributed machine learning systems where poisoned updates can evade detection while maintaining benign performance.

← PrevPage 8 of 916Next →