#robustness News & Analysis

55 articles tagged with #robustness. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

55 articles

AINeutralarXiv – CS AI · May 126/10

🧠

Internalizing Safety Understanding in Large Reasoning Models via Verification

Researchers propose Safety Internal (SInternal), a framework that trains large reasoning models to verify the safety of their own outputs rather than relying on external compliance mechanisms. The approach demonstrates that models can internalize safety understanding through verification tasks, significantly improving robustness against adversarial jailbreaks and out-of-domain attacks.

AINeutralarXiv – CS AI · May 126/10

🧠

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Researchers introduce DUDE, a framework that teaches AI web agents to resist deceptive interface elements through hybrid-reward learning and experience summarization. The accompanying RUC benchmark demonstrates the framework reduces susceptibility to deception by 53.8% while preserving task performance, addressing a critical vulnerability in autonomous GUI interaction systems.

AINeutralarXiv – CS AI · May 126/10

🧠

Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising

Researchers present a parameter-free wrapper method (WNE) that enforces Normalization Equivariance—robustness to brightness and contrast shifts—around any neural network backbone without architectural constraints. The approach characterizes NE as a normalize-process-denormalize factorization, enabling compatibility with modern components like transformers and attention mechanisms while avoiding the 1.6x computational overhead of existing methods.

AINeutralarXiv – CS AI · May 126/10

🧠

FragileFlow: Spectral Control of Correct-but-Fragile Predictions for Foundation Model Robustness

FragileFlow introduces a theoretical framework and practical regularizer to detect and mitigate a hidden failure mode in large language models and vision-language models where predictions remain technically correct but confidence margins narrow dangerously. The research provides the first PAC-Bayes bounds for margin-aware error flow, addressing robustness gaps that standard accuracy metrics overlook.

AINeutralarXiv – CS AI · May 126/10

🧠

UFO: A Unified Flow-Oriented Framework for Robust Continual Graph Learning

Researchers introduce UFO, a framework addressing robust continual graph learning by simultaneously tackling catastrophic forgetting and noisy data supervision in evolving graphs. The method uses flow-based generative modeling to mitigate forgetting and instance-level reliability scoring to handle corrupted labels, demonstrating superior performance across benchmark datasets.

AINeutralarXiv – CS AI · May 116/10

🧠

Exposing and Mitigating Temporal Attack in Deepfake Video Detection

Researchers reveal that spatiotemporal deepfake detection models are vulnerable to evasion attacks because they rely on fragile temporal spectrum cues rather than robust semantic understanding. The team proposes SpInShield, a defense framework using learnable spectral adversaries and shortcut suppression to improve detection robustness, achieving 21.30 percentage points better AUC against amplitude spectral attacks.

AIBullisharXiv – CS AI · May 96/10

🧠

Information Theoretic Adversarial Training of Large Language Models

Researchers propose WARDEN, an information-theoretic adversarial training framework that improves Large Language Model robustness against prompt attacks by dynamically reweighting adversarial examples using f-divergence principles. The method achieves comparable computational efficiency to existing approaches while substantially reducing attack success rates, advancing the scalability of AI safety mechanisms.

AINeutralarXiv – CS AI · May 96/10

🧠

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

Researchers propose VPSD-RL, a reinforcement learning framework that discovers value-preserving structures in continuous control tasks using Lie-group operators and diffusion models. The method improves data efficiency and robustness by identifying nonlinear transformations that preserve optimal value functions, addressing brittleness in RL systems under environmental variability.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise

Researchers introduce VisPrompt, a framework that improves prompt learning for vision-language models by injecting visual semantic information to enhance robustness against label noise. The approach keeps pre-trained models frozen while adding minimal trainable parameters, demonstrating superior performance across seven benchmark datasets under both synthetic and real-world noisy conditions.

AIBearisharXiv – CS AI · Apr 136/10

🧠

Adversarial Evasion Attacks on Computer Vision using SHAP Values

Researchers demonstrate a white-box adversarial attack on computer vision models using SHAP values to identify and exploit critical input features, showing superior robustness compared to the Fast Gradient Sign Method, particularly when gradient information is obscured or hidden.

AIBearisharXiv – CS AI · Apr 106/10

🧠

Robustness Risk of Conversational Retrieval: Identifying and Mitigating Noise Sensitivity in Qwen3-Embedding Model

Researchers identified a critical robustness vulnerability in Qwen3-embedding models for conversational retrieval, where structured dialogue noise becomes disproportionately retrievable and contaminates search results. The problem remains invisible under standard benchmarks but is significantly more pronounced in Qwen3 than competing models, though lightweight query prompting effectively mitigates it.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Improving Robustness In Sparse Autoencoders via Masked Regularization

Researchers propose a masked regularization technique to improve the robustness and interpretability of Sparse Autoencoders (SAEs) used in large language model analysis. The method addresses feature absorption and out-of-distribution performance failures by randomly replacing tokens during training to disrupt co-occurrence patterns, offering a practical path toward more reliable mechanistic interpretability tools.

AINeutralarXiv – CS AI · Mar 266/10

🧠

Can VLMs Reason Robustly? A Neuro-Symbolic Investigation

Researchers investigated whether Vision-Language Models (VLMs) can reason robustly under distribution shifts and found that fine-tuned VLMs achieve high accuracy in-distribution but fail to generalize. They propose VLC, a neuro-symbolic method combining VLM-based concept recognition with circuit-based symbolic reasoning that demonstrates consistent performance under covariate shifts.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.

AINeutralarXiv – CS AI · Mar 126/10

🧠

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Researchers propose Contract And Conquer (CAC), a new method for provably generating adversarial examples against black-box neural networks using knowledge distillation and search space contraction. The approach provides theoretical guarantees for finding adversarial examples within a fixed number of iterations and outperforms existing methods on ImageNet datasets including vision transformers.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Prompt Group-Aware Training for Robust Text-Guided Nuclei Segmentation

Researchers developed a new training method to improve the robustness of AI foundation models like SAM3 for medical image segmentation by reducing sensitivity to prompt variations. The approach groups semantically similar prompts together and uses consistency constraints to ensure more reliable predictions across different prompt formulations.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Pulse-Driven Neural Architecture: Learnable Oscillatory Dynamics for Robust Continuous-Time Sequence Processing

Researchers introduce PDNA (Pulse-Driven Neural Architecture), a new continuous-time neural network that incorporates learnable oscillatory dynamics to improve robustness when input sequences are interrupted. The method shows significant performance improvements on sequential MNIST tasks, with the pulse variant achieving a 4.62 percentage point advantage over baseline models.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Researchers propose Explanation-Guided Adversarial Training (EGAT), a framework that combines adversarial training with explainable AI to create more robust and interpretable deep neural networks. The method achieves 37% improvement in adversarial accuracy while producing semantically meaningful explanations with only 16% increase in training time.

AIBullisharXiv – CS AI · Mar 27/1020

🧠

Training Generalizable Collaborative Agents via Strategic Risk Aversion

Researchers developed a new multi-agent reinforcement learning algorithm that uses strategic risk aversion to create AI agents that can reliably collaborate with unseen partners. The approach addresses the problem of brittle AI collaboration systems that fail when working with new partners by incorporating robustness against behavioral deviations.

AIBullisharXiv – CS AI · Feb 276/105

🧠

To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

Researchers introduce AOT (Adversarial Opponent Training), a self-play framework that improves Multimodal Large Language Models' robustness by having an AI attacker generate adversarial image manipulations to train a defender model. The method addresses perceptual fragility in MLLMs when processing visually complex scenes, reducing hallucinations through dynamic adversarial training.

AIBullisharXiv – CS AI · Mar 274/10

🧠

Measuring What Matters -- or What's Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors

Researchers tested a dual-architecture LLM-based automated scoring system for educational assessments and found it generally robust to construct-irrelevant factors like meaningless text padding and spelling errors. The study shows promise for LLM-based scoring systems' reliability when properly designed, though off-topic responses were heavily penalized.

AIBullisharXiv – CS AI · Mar 174/10

🧠

FedUAF: Uncertainty-Aware Fusion with Reliability-Guided Aggregation for Multimodal Federated Sentiment Analysis

Researchers propose FedUAF, a new multimodal federated learning framework that addresses challenges in sentiment analysis by using uncertainty-aware fusion and reliability-guided aggregation. The system demonstrates superior performance on benchmark datasets CMU-MOSI and CMU-MOSEI, showing improved robustness against missing modalities and unreliable client updates in federated learning environments.

AINeutralarXiv – CS AI · Mar 174/10

🧠

Circuit Representations of Random Forests with Applications to XAI

Researchers developed a new method for converting random forest classifiers into circuit representations that enables more efficient computation of decision explanations. The approach provides tools for computing robustness metrics and identifying ways to alter classifier decisions, with applications in explainable AI (XAI).

AINeutralarXiv – CS AI · Mar 124/10

🧠

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Researchers introduce EvoSchema, a comprehensive benchmark to test how well text-to-SQL AI models handle database schema changes over time. The study reveals that table-level changes significantly impact model performance more than column-level modifications, and proposes training methods to improve model robustness in dynamic database environments.

AINeutralarXiv – CS AI · Mar 114/10

🧠

Correction of Transformer-Based Models with Smoothing Pseudo-Projector

Researchers have developed a pseudo-projector technique that can be integrated into existing transformer-based language models to improve their robustness and training dynamics without changing core architecture. The method, inspired by multigrid paradigms, acts as a hidden-representation corrector that reduces sensitivity to noise by suppressing directions from label-irrelevant input content.

← PrevPage 2 of 3Next →