#ai-robustness News & Analysis

48 articles tagged with #ai-robustness. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

48 articles

AIBullisharXiv – CS AI · Mar 127/10

🧠

Are Video Reasoning Models Ready to Go Outside?

Researchers propose ROVA, a new training framework that improves vision-language models' robustness in real-world conditions by up to 24% accuracy gains. The framework addresses performance degradation from weather, occlusion, and camera motion that can cause up to 35% accuracy drops in current models.

AIBullisharXiv – CS AI · Mar 97/10

🧠

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Researchers developed Sysformer, a novel approach to safeguard large language models by adapting system prompts rather than fine-tuning model parameters. The method achieved up to 80% improvement in refusing harmful prompts while maintaining 90% compliance with safe prompts across 5 different LLMs.

AIBearisharXiv – CS AI · Mar 57/10

🧠

SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Researchers developed SycoEval-EM, a framework testing how large language models resist patient pressure for inappropriate medical care in emergency settings. Testing 20 LLMs across 1,875 encounters revealed acquiescence rates of 0-100%, with models more vulnerable to imaging requests than opioid prescriptions, highlighting the need for adversarial testing in clinical AI certification.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Learning Robust Intervention Representations with Delta Embeddings

Researchers propose Causal Delta Embeddings, a new method for learning robust AI representations from image pairs that improves out-of-distribution performance. The approach focuses on representing interventions in causal models rather than just scene variables, achieving significant improvements in synthetic and real-world benchmarks without additional supervision.

AIBullisharXiv – CS AI · Jun 256/10

🧠

Supervised Post-training of Speech Foundation Models for Robust Adaptation in Speech Deepfake Detection

Researchers propose a supervised post-training method for speech foundation models that improves deepfake detection by addressing the mismatch between self-supervised learning objectives and spoof-detection requirements. The approach achieves state-of-the-art results on multiple benchmarks, demonstrating that targeted adaptation strategies can enhance AI model robustness for security applications.

AINeutralarXiv – CS AI · Jun 236/10

🧠

EgoExo-Con: Exploring View-Invariant Video Temporal Understanding

Researchers introduce EgoExo-Con, a benchmark testing whether video language models maintain consistent temporal understanding across different camera viewpoints of the same event. The study reveals that existing Video-LLMs struggle with cross-view consistency and proposes View-GRPO, a reinforcement learning framework to improve temporal reasoning across viewpoints.

AINeutralarXiv – CS AI · Jun 116/10

🧠

When Poison Fails After Retrieval: Revisiting Corpus Poisoning under Chunking and Reranking Pipelines

Researchers demonstrate that existing corpus poisoning attacks against RAG systems fail significantly after reranking stages, revealing a critical gap between retrieval-stage attacks and real-world multi-stage pipelines. They propose CRCP, a new poisoning framework that accounts for document chunking and reranking to achieve higher attack success rates across realistic retrieval configurations.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Feature-Aligned Speech Watermarking for Robustness to Reconstruction Distortions

Researchers propose a feature-aligned speech watermarking method that embeds imperceptible identifiable information into audio while maintaining robustness against speech reconstruction models. By aligning watermarks with original speech feature distributions, the technique overcomes the traditional robustness-fidelity trade-off that has limited previous audio watermarking approaches.

AINeutralarXiv – CS AI · Jun 106/10

🧠

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Researchers introduce ASyMOB, a 35,368-problem benchmark dataset for evaluating large language models on symbolic mathematics tasks. The dataset uses systematic perturbations to test genuine reasoning rather than pattern memorization, revealing that most models fail under minor problem variations while hybrid LLM-computer algebra system approaches show promise for scientific computing applications.

AINeutralarXiv – CS AI · Jun 16/10

🧠

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

Researchers introduce COMPASS, a safety alignment framework for LLM-powered search agents that prevents harmful outcomes from seemingly innocent multi-step queries. The method combines cognitive tree exploration and step-wise alignment to achieve robust safety while maintaining utility, requiring less training data than existing approaches.

AINeutralarXiv – CS AI · May 296/10

🧠

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

Researchers introduce AliMark, a novel sentence-level watermarking framework that improves robustness against text paraphrasing by reformulating watermark detection as a bit sequence alignment problem. The approach uses multiple text variants and adaptive alignment strategies to withstand structural perturbations like sentence splitting and merging, substantially outperforming existing methods against strong paraphrasers.

AINeutralarXiv – CS AI · May 296/10

🧠

Do Language Models Track Entities Across State Changes?

Researchers investigated how transformer language models track entity states through multiple changes, finding that LMs use a non-incremental parallel aggregation strategy rather than sequential state tracking. The study reveals LMs implement state removal operations through a fragile global suppression mechanism, explaining various failure modes and suggesting mechanistic improvements for more robust entity tracking.

AINeutralarXiv – CS AI · May 276/10

🧠

Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

SemProbe is a new interactive tool for testing object detection systems in safety-critical applications using semantically meaningful image corruptions rather than simple pixel-level noise. The system uses diffusion-based inpainting to generate realistic test scenarios, automatically runs model inference, and logs results as structured artifacts for safety evaluation compliance.

AIBearisharXiv – CS AI · May 126/10

🧠

Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs

Researchers tested how well Large Language Models handle multi-turn conversations with topic shifts, finding that most LLMs struggle to detect when users pivot to new topics and incorrectly carry over irrelevant context from previous exchanges. The study reveals that only advanced reasoning models and strongly instructed LLMs perform accurately, while open-weight models frequently fail even with explicit cues, highlighting a critical robustness gap in production LLM deployments.

AINeutralarXiv – CS AI · May 116/10

🧠

A Statistical Framework for Algorithmic Collective Action with Multiple Collectives

Researchers propose the first statistical framework for Algorithmic Collective Action (ACA) involving multiple independent collectives attempting to coordinate changes in shared data to influence AI classifier behavior. The framework provides computable bounds on collective success while accounting for varying sizes, strategies, and goal alignment across groups, with applications to climate adaptation in smart cities.

AINeutralarXiv – CS AI · May 76/10

🧠

NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise

Researchers introduce NoisyCausal, a benchmark for testing how well large language models handle causal reasoning when presented with noisy, incomplete, or misleading information. The study proposes a modular framework combining LLMs with explicit causal graph structures, demonstrating significant improvements over standard prompting approaches and better generalization across external benchmarks.

AINeutralarXiv – CS AI · Apr 206/10

🧠

VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck

Researchers propose VIB-Probe, a novel framework using Variational Information Bottleneck theory to detect and mitigate hallucinations in Vision-Language Models by analyzing internal attention mechanisms. The method identifies specific attention heads responsible for truthful generation and introduces an inference-time intervention strategy that outperforms existing detection baselines.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

Researchers introduce Evolve-CTF, a tool that generates families of semantically-equivalent cybersecurity challenges to evaluate the robustness of agentic LLMs. Testing 13 LLM configurations reveals models are resilient to basic code transformations but struggle with obfuscation and composed modifications, providing new benchmarking methodology for AI safety evaluation.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Context is All You Need

Researchers introduce CONTXT, a lightweight neural network adaptation method that improves AI model performance when deployed on data different from training data. The technique uses simple additive and multiplicative transforms to modulate internal representations, providing consistent gains across both discriminative and generative models including LLMs.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Researchers developed Q-DIG, a red-teaming method that uses Quality Diversity techniques to identify diverse language instruction failures in Vision-Language-Action models for robotics. The approach generates adversarial prompts that expose vulnerabilities in robot behavior and improves task success rates when used for fine-tuning.

AIBearisharXiv – CS AI · Mar 37/108

🧠

The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents

Researchers introduced the Synthetic Web Benchmark, revealing that frontier AI language models fail catastrophically when exposed to high-plausibility misinformation in search results. The study shows current AI agents struggle to handle conflicting information sources, with accuracy collapsing despite access to truthful content.

AINeutralarXiv – CS AI · Mar 35/104

🧠

Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness

Researchers propose SCER (Spurious Correlation-Aware Embedding Regularization), a new deep learning approach that improves AI model robustness by regularizing feature representations to suppress spurious correlations. The method demonstrates superior performance in worst-group accuracy across vision and language tasks compared to existing state-of-the-art approaches.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Evaluating Semantic Fragility in Text-to-Audio Generation Systems Under Controlled Prompt Perturbations

Researchers evaluated the semantic fragility of text-to-audio generation systems, finding that small changes in prompts can lead to substantial variations in generated audio output. While larger models like MusicGen-large showed better semantic consistency, all models exhibited persistent divergence in acoustic and temporal characteristics even when semantic similarity remained high.

← PrevPage 2 of 2