36 articles tagged with #robustness. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv – CS AI · 2d ago7/10
🧠Researchers discovered that large reasoning models (LRMs) like DeepSeek R1 and Llama become significantly more vulnerable to adversarial attacks when presented with conflicting objectives or ethical dilemmas. Testing across 1,300+ prompts revealed that safety mechanisms break down when internal alignment values compete, with neural representations of safety and functionality overlapping under conflict.
🧠 Llama
AIBullisharXiv – CS AI · Apr 77/10
🧠Researchers propose Online Label Refinement (OLR) to improve AI reasoning models' robustness under noisy supervision in Reinforcement Learning with Verifiable Rewards. The method addresses the critical problem of training language models when expert-labeled data contains errors, achieving 3-4% performance gains across mathematical reasoning benchmarks.
AINeutralarXiv – CS AI · Apr 67/10
🧠Researchers propose a new heuristic algorithm combining server learning with client update filtering and geometric median aggregation to improve federated learning robustness against malicious attacks. The approach maintains model accuracy even when over 50% of clients are malicious and works with non-identical data distributions across clients.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers propose OrthoFormer, a new Transformer architecture that addresses causal learning limitations by embedding instrumental variable estimation directly into neural networks. The framework aims to distinguish between spurious correlations and true causal mechanisms, potentially improving AI model robustness and reliability under distribution shifts.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers developed new methods for extracting symbolic formulas from Kolmogorov-Arnold Networks (KANs), addressing a key bottleneck in making AI models more interpretable. The proposed Greedy in-context Symbolic Regression (GSR) and Gated Matching Pursuit (GMP) methods achieved up to 99.8% reduction in test error while improving robustness.
AIBullisharXiv – CS AI · Mar 177/10
🧠ADV-0 is a new closed-loop adversarial training framework for autonomous driving that uses min-max optimization to improve robustness against rare but safety-critical scenarios. The system treats the interaction between driving policy and adversarial agents as a zero-sum game, converging to Nash Equilibrium while maximizing real-world performance bounds.
AIBullisharXiv – CS AI · Mar 167/10
🧠Researchers propose a new theoretical framework explaining why modern machine learning models achieve robust performance using high-dimensional, error-prone data, challenging the traditional 'Garbage In, Garbage Out' principle. The study introduces concepts like 'Informative Collinearity' and 'Proactive Data-Centric AI' to show how data architecture and model capacity work together to overcome noise and structural uncertainty.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers introduced WebRRSBench, a comprehensive benchmark evaluating multimodal large language models' reasoning, robustness, and safety capabilities for web understanding tasks. Testing 11 MLLMs on 3,799 QA pairs from 729 websites revealed significant gaps in compositional reasoning, UI robustness, and safety-critical action recognition.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers developed DMAST, a new training framework that protects multimodal web agents from cross-modal attacks where adversaries inject malicious content into webpages to deceive both visual and text processing channels. The method uses adversarial training through a three-stage pipeline and significantly outperforms existing defenses while doubling task completion efficiency.
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers propose CoIPO (Contrastive Learning-based Inverse Direct Preference Optimization), a new method to improve Large Language Model robustness against noisy or imperfect user prompts. The approach enhances LLMs' intrinsic ability to handle prompt variations without relying on external preprocessing tools, showing significant accuracy improvements on benchmark tests.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduce Adversarially-Aligned Jacobian Regularization (AAJR), a new method to improve the robustness of autonomous AI agent systems by controlling sensitivity along adversarial directions rather than globally. This approach maintains better performance while ensuring stability in multi-agent AI ecosystems compared to existing methods.
AINeutralarXiv – CS AI · Mar 46/103
🧠Research reveals that contrastive steering, a method for adjusting LLM behavior during inference, is moderately robust to data corruption but vulnerable to malicious attacks when significant portions of training data are compromised. The study identifies geometric patterns in corruption types and proposes using robust mean estimators as a safeguard against unwanted effects.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers propose a dual Randomized Smoothing framework that overcomes limitations of standard neural network robustness certification by using input-dependent noise variances instead of global ones. The method achieves strong performance at both small and large radii with gains of 15-20% on CIFAR-10 and 8-17% on ImageNet, while adding only 60% computational overhead.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed a method to improve foundation models in medical histopathology by introducing robustness losses during training, reducing sensitivity to technical variations while maintaining accuracy. The approach was tested on over 27,000 whole slide images from 6,155 patients across eight popular foundation models, showing improved robustness and prediction accuracy without requiring retraining of the foundation models themselves.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce VisPrompt, a framework that improves prompt learning for vision-language models by injecting visual semantic information to enhance robustness against label noise. The approach keeps pre-trained models frozen while adding minimal trainable parameters, demonstrating superior performance across seven benchmark datasets under both synthetic and real-world noisy conditions.
AIBearisharXiv – CS AI · 3d ago6/10
🧠Researchers demonstrate a white-box adversarial attack on computer vision models using SHAP values to identify and exploit critical input features, showing superior robustness compared to the Fast Gradient Sign Method, particularly when gradient information is obscured or hidden.
AIBearisharXiv – CS AI · 6d ago6/10
🧠Researchers identified a critical robustness vulnerability in Qwen3-embedding models for conversational retrieval, where structured dialogue noise becomes disproportionately retrievable and contaminates search results. The problem remains invisible under standard benchmarks but is significantly more pronounced in Qwen3 than competing models, though lightweight query prompting effectively mitigates it.
AIBullisharXiv – CS AI · 6d ago6/10
🧠Researchers propose a masked regularization technique to improve the robustness and interpretability of Sparse Autoencoders (SAEs) used in large language model analysis. The method addresses feature absorption and out-of-distribution performance failures by randomly replacing tokens during training to disrupt co-occurrence patterns, offering a practical path toward more reliable mechanistic interpretability tools.
AINeutralarXiv – CS AI · Mar 266/10
🧠Researchers investigated whether Vision-Language Models (VLMs) can reason robustly under distribution shifts and found that fine-tuned VLMs achieve high accuracy in-distribution but fail to generalize. They propose VLC, a neuro-symbolic method combining VLM-based concept recognition with circuit-based symbolic reasoning that demonstrates consistent performance under covariate shifts.
AINeutralarXiv – CS AI · Mar 166/10
🧠Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.
AINeutralarXiv – CS AI · Mar 126/10
🧠Researchers propose Contract And Conquer (CAC), a new method for provably generating adversarial examples against black-box neural networks using knowledge distillation and search space contraction. The approach provides theoretical guarantees for finding adversarial examples within a fixed number of iterations and outperforms existing methods on ImageNet datasets including vision transformers.
AIBullisharXiv – CS AI · Mar 96/10
🧠Researchers developed a new training method to improve the robustness of AI foundation models like SAM3 for medical image segmentation by reducing sensitivity to prompt variations. The approach groups semantically similar prompts together and uses consistency constraints to ensure more reliable predictions across different prompt formulations.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers propose Explanation-Guided Adversarial Training (EGAT), a framework that combines adversarial training with explainable AI to create more robust and interpretable deep neural networks. The method achieves 37% improvement in adversarial accuracy while producing semantically meaningful explanations with only 16% increase in training time.
AIBullisharXiv – CS AI · Mar 36/104
🧠Researchers introduce PDNA (Pulse-Driven Neural Architecture), a new continuous-time neural network that incorporates learnable oscillatory dynamics to improve robustness when input sequences are interrupted. The method shows significant performance improvements on sequential MNIST tasks, with the pulse variant achieving a 4.62 percentage point advantage over baseline models.
AIBullisharXiv – CS AI · Mar 27/1020
🧠Researchers developed a new multi-agent reinforcement learning algorithm that uses strategic risk aversion to create AI agents that can reliably collaborate with unseen partners. The approach addresses the problem of brittle AI collaboration systems that fail when working with new partners by incorporating robustness against behavioral deviations.