🧠

AI

21,452 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21452 articles

AINeutralarXiv – CS AI · Mar 36/105

🧠

A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection

Researchers introduced Spoof-SUPERB, a new benchmark for evaluating self-supervised learning models' ability to detect audio deepfakes. The study tested 20 SSL models and found that large-scale discriminative models like XLS-R and WavLM Large consistently outperformed others, especially under acoustic degradations.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Toward Graph-Tokenizing Large Language Models with Reconstructive Graph Instruction Tuning

Researchers have developed RGLM, a new approach to improve how large language models understand and process graph data by incorporating explicit graph supervision alongside text instructions. The method addresses limitations in existing Graph-Tokenizing LLMs that rely too heavily on text supervision, leading to underutilization of graph context.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Causal Neural Probabilistic Circuits

Researchers propose Causal Neural Probabilistic Circuits (CNPC), a new AI model that enhances interpretable machine learning by incorporating causal dependencies between concepts. The model allows domain experts to make corrections that properly propagate through causal relationships, achieving higher accuracy than baseline models across benchmark datasets.

AINeutralarXiv – CS AI · Mar 37/108

🧠

Align and Filter: Improving Performance in Asynchronous On-Policy RL

Researchers propose a new method called total Variation-based Advantage aligned Constrained policy Optimization to address policy lag issues in distributed reinforcement learning systems. The approach aims to improve performance when scaling on-policy learning algorithms by mitigating the mismatch between behavior and learning policies during high-frequency updates.

AIBearisharXiv – CS AI · Mar 37/108

🧠

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Researchers have discovered VidDoS, a new universal attack framework that can severely degrade Video-based Large Language Models by causing extreme computational resource exhaustion. The attack increases token generation by over 205x and inference latency by more than 15x, creating critical safety risks in real-world applications like autonomous driving.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Constructing Synthetic Instruction Datasets for Improving Reasoning in Domain-Specific LLMs: A Case Study in the Japanese Financial Domain

Researchers developed a method for creating synthetic instruction datasets to improve domain-specific LLMs, demonstrating with a 9.5 billion token Japanese financial dataset. The approach enhances both domain expertise and reasoning capabilities, with models and datasets being open-sourced for broader use.

AIBearisharXiv – CS AI · Mar 36/107

🧠

PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncology

Researchers created PanCanBench, a comprehensive benchmark evaluating 22 large language models on pancreatic cancer-related patient questions, revealing significant variations in clinical accuracy and high hallucination rates. The study found that even top-performing models like GPT-4o and Gemini-2.5 Pro had hallucination rates of 6%, while newer reasoning-optimized models didn't consistently improve factual accuracy.

AIBullisharXiv – CS AI · Mar 36/109

🧠

Provable and Practical In-Context Policy Optimization for Self-Improvement

Researchers introduce In-Context Policy Optimization (ICPO), a new method that allows AI models to improve their responses during inference through multi-round self-reflection without parameter updates. The practical ME-ICPO algorithm demonstrates competitive performance on mathematical reasoning tasks while maintaining affordable inference costs.

AINeutralarXiv – CS AI · Mar 36/108

🧠

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models

New theoretical research analyzes how Large Language Models learn during pretraining versus post-training phases, revealing that balanced pretraining data creates latent capabilities activated later, while supervised fine-tuning works best on small, challenging datasets and reinforcement learning requires large-scale data that isn't overly difficult.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Attention Smoothing Is All You Need For Unlearning

Researchers propose Attention Smoothing Unlearning (ASU), a new framework that helps Large Language Models forget sensitive or copyrighted content without losing overall performance. The method uses self-distillation and attention smoothing to erase specific knowledge while maintaining coherent responses, outperforming existing unlearning techniques.

AIBullisharXiv – CS AI · Mar 36/107

🧠

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models

Researchers introduce AG-VAS, a new AI framework that uses large multimodal models for zero-shot visual anomaly segmentation. The system employs learnable semantic anchor tokens and achieves state-of-the-art performance on industrial and medical benchmarks without requiring training data for specific anomaly types.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Spectral Attention Steering for Prompt Highlighting

Researchers introduce SEKA and AdaSEKA, new training-free methods for attention steering in AI models that work with memory-efficient implementations like FlashAttention. These techniques enable better prompt highlighting by directly editing key embeddings using spectral decomposition, offering significant performance improvements with lower computational overhead.

AIBullisharXiv – CS AI · Mar 37/109

🧠

From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

Researchers have developed MM-Mem, a new pyramidal multimodal memory architecture that enables AI systems to better understand long-horizon videos by mimicking human cognitive memory processes. The system addresses current limitations in multimodal large language models by creating a hierarchical memory structure that progressively distills detailed visual information into high-level semantic understanding.

AIBullisharXiv – CS AI · Mar 36/106

🧠

MetaState: Persistent Working Memory for Discrete Diffusion Language Models

Researchers introduce MetaState, a recurrent augmentation for discrete diffusion language models (dLLMs) that adds persistent working memory to improve text generation quality. The system addresses the 'Information Island' problem where intermediate representations are discarded between denoising steps, achieving improved accuracy on LLaDA-8B and Dream-7B models with minimal parameter overhead.

AIBullisharXiv – CS AI · Mar 36/106

🧠

GlassMol: Interpretable Molecular Property Prediction with Concept Bottleneck Models

Researchers introduce GlassMol, a new interpretable AI model for molecular property prediction that addresses the black-box problem in drug discovery. The model uses Concept Bottleneck Models with automated concept curation and LLM-guided selection, achieving performance that matches or exceeds traditional black-box models across thirteen benchmarks.

AIBullisharXiv – CS AI · Mar 37/106

🧠

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

MOSAIC is a new open-source platform that enables cross-paradigm comparison and evaluation of different AI agents including reinforcement learning, large language models, vision-language models, and human decision-makers within the same environment. The platform introduces three key technical contributions: an IPC-based worker protocol, operator abstraction for unified interfaces, and a deterministic evaluation framework for reproducible research.

AINeutralarXiv – CS AI · Mar 37/108

🧠

The MAMA-MIA Challenge: Advancing Generalizability and Fairness in Breast MRI Tumor Segmentation and Treatment Response Prediction

The MAMA-MIA Challenge introduced a large-scale benchmark for AI-powered breast cancer tumor segmentation and treatment response prediction using MRI data from 1,506 US patients for training and 574 European patients for testing. Results from 26 international teams revealed significant performance variability and trade-offs between accuracy and fairness across demographic subgroups when AI models were tested across different institutions and continents.

AIBearisharXiv – CS AI · Mar 37/109

🧠

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

A study reveals that safety-aligned large language models exhibit "Defensive Refusal Bias," refusing legitimate cybersecurity defense tasks 2.72x more often when they contain security-sensitive keywords. The research found particularly high refusal rates for critical defensive operations like system hardening (43.8%) and malware analysis (34.3%), suggesting current AI safety measures rely on semantic similarity rather than understanding intent.

AIBullisharXiv – CS AI · Mar 36/106

🧠

TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents

Researchers developed TARSE, a new AI system for clinical decision-making that retrieves relevant medical skills and experiences from curated libraries to improve reasoning accuracy. The system performs test-time adaptation to align language models with clinically valid logic, showing improvements over existing medical AI baselines in question-answering benchmarks.

AIBullisharXiv – CS AI · Mar 36/106

🧠

Linking Knowledge to Care: Knowledge Graph-Augmented Medical Follow-Up Question Generation

Researchers developed KG-Followup, a knowledge graph-augmented large language model system that generates medical follow-up questions for pre-diagnostic assessment. The system combines structured medical domain knowledge with LLMs to improve clinical diagnosis efficiency, outperforming existing methods by 5-8% in recall benchmarks.

AINeutralarXiv – CS AI · Mar 36/106

🧠

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

Researchers identified Self-Anchoring Calibration Drift (SACD), where large language models show systematic confidence changes when building on their own outputs in multi-turn conversations. Testing Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 revealed model-specific patterns, with Claude showing decreasing confidence and significant calibration errors, while GPT-5.2 exhibited opposite behavior in open-ended domains.

$NEAR

AIBullisharXiv – CS AI · Mar 36/106

🧠

Monocular 3D Object Position Estimation with VLMs for Human-Robot Interaction

Researchers developed a Vision-Language Model capable of estimating 3D object positions from monocular RGB images for human-robot interaction. The model achieved a median accuracy of 13mm and can make acceptable predictions for robot interaction in 25% of cases, representing a five-fold improvement over baseline methods.

AIBullisharXiv – CS AI · Mar 36/106

🧠

VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning

Researchers developed VisNec, a framework that identifies which training samples truly require visual reasoning for multimodal AI instruction tuning. The method achieves equivalent performance using only 15% of training data by filtering out visually redundant samples, potentially making multimodal AI training more efficient.

AIBullisharXiv – CS AI · Mar 36/105

🧠

Shape-Interpretable Visual Self-Modeling Enables Geometry-Aware Continuum Robot Control

Researchers developed a shape-interpretable visual self-modeling framework for continuum robots that enables geometry-aware control using Bezier-curve representations and neural ordinary differential equations. The system achieves accurate shape-position regulation with shape errors within 1.56% and end-effector errors within 2% while enabling obstacle avoidance and environmental awareness.

$CRV

AIBullisharXiv – CS AI · Mar 36/107

🧠

Mean-Flow based One-Step Vision-Language-Action

Researchers developed a Mean-Flow based One-Step Vision-Language-Action (VLA) approach that dramatically improves robotic manipulation efficiency by eliminating iterative sampling requirements. The new method achieves 8.7x faster generation than SmolVLA and 83.9x faster than Diffusion Policy in real-world robotic experiments.

← PrevPage 555 of 859Next →