#adversarial-training News & Analysis

22 articles tagged with #adversarial-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles

AIBullisharXiv – CS AI · May 127/10

🧠

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

Researchers propose Anchored Bipolicy Self-Play, a new safety training method that addresses fundamental limitations in parameter-shared self-play red teaming by using distinct LoRA adapters for attacker and defender roles. The approach achieves 100x greater parameter efficiency and improved safety robustness across multiple language model scales without sacrificing reasoning ability.

AIBullisharXiv – CS AI · Apr 157/10

🧠

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Researchers introduce Criticality-Aware Adversarial Training (CAAT), a parameter-efficient method that identifies and fine-tunes only the most robustness-critical parameters in Vision Transformers, achieving 94.3% of standard adversarial training robustness while tuning just 6% of model parameters. This breakthrough addresses the computational bottleneck preventing large-scale adversarial training deployment.

AIBullisharXiv – CS AI · Mar 177/10

🧠

ADV-0: Closed-Loop Min-Max Adversarial Training for Long-Tail Robustness in Autonomous Driving

ADV-0 is a new closed-loop adversarial training framework for autonomous driving that uses min-max optimization to improve robustness against rare but safety-critical scenarios. The system treats the interaction between driving policy and adversarial agents as a zero-sum game, converging to Nash Equilibrium while maximizing real-world performance bounds.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations

Researchers introduced Eva-VLA, the first unified framework to systematically evaluate the robustness of Vision-Language-Action models for robotic manipulation under real-world physical variations. Testing revealed OpenVLA exhibits over 90% failure rates across three physical variations, exposing critical weaknesses in current VLA models when deployed outside laboratory conditions.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Researchers developed DMAST, a new training framework that protects multimodal web agents from cross-modal attacks where adversaries inject malicious content into webpages to deceive both visual and text processing channels. The method uses adversarial training through a three-stage pipeline and significantly outperforms existing defenses while doubling task completion efficiency.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Researchers introduce Adversarially-Aligned Jacobian Regularization (AAJR), a new method to improve the robustness of autonomous AI agent systems by controlling sensitivity along adversarial directions rather than globally. This approach maintains better performance while ensuring stability in multi-agent AI ecosystems compared to existing methods.

AIBullisharXiv – CS AI · Mar 37/103

🧠

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Researchers introduce GAR (Generative Adversarial Reinforcement Learning), a new AI training framework that jointly trains problem generators and solvers in an adversarial loop for formal theorem proving. The method shows significant improvements in mathematical proof capabilities, with models achieving 4.20% average relative improvement on benchmark tests.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Adversarial Training for Robust Coverage Network under Worst-case Facility Losses

Researchers propose a Dual-Agent Deep Reinforcement Learning framework to solve the Maximal Covering Location-Interdiction Problem, a computationally complex bi-level optimization challenge critical for resilient infrastructure planning. The adversarial training approach, where location and interdiction agents compete, achieves superior computational efficiency while maintaining competitive solution quality across synthetic and real-world datasets.

AIBullisharXiv – CS AI · May 96/10

🧠

Information Theoretic Adversarial Training of Large Language Models

Researchers propose WARDEN, an information-theoretic adversarial training framework that improves Large Language Model robustness against prompt attacks by dynamically reweighting adversarial examples using f-divergence principles. The method achieves comparable computational efficiency to existing approaches while substantially reducing attack success rates, advancing the scalability of AI safety mechanisms.

AIBullisharXiv – CS AI · Apr 106/10

🧠

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Researchers introduce PyFi, a framework enabling vision language models to understand financial images through progressive reasoning chains, backed by a 600K synthetic dataset organized as a reasoning pyramid. The approach uses adversarial agents to automatically generate training data without human annotation, achieving up to 19.52% accuracy improvements on fine-tuned models.

AIBullisharXiv – CS AI · Mar 266/10

🧠

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Researchers introduce Generative Adversarial Reasoner, a new training framework that improves LLM mathematical reasoning by using adversarial reinforcement learning between a reasoner and discriminator model. The method achieved significant performance gains on mathematical benchmarks, improving DeepSeek models by 7-10 percentage points on AIME24 tests.

🧠 Llama

AIBullisharXiv – CS AI · Mar 36/107

🧠

ThreatFormer-IDS: Robust Transformer Intrusion Detection with Zero-Day Generalization and Explainable Attribution

Researchers developed ThreatFormer-IDS, a Transformer-based intrusion detection system that achieves robust cybersecurity monitoring for IoT and industrial networks. The system demonstrates superior performance in detecting zero-day attacks while providing explainable threat attribution, achieving 99.4% AUC-ROC on benchmark tests.

AIBullisharXiv – CS AI · Mar 36/105

🧠

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Researchers introduce CEMMA, a co-evolutionary framework for improving AI safety alignment in multimodal large language models. The system uses evolving adversarial attacks and adaptive defenses to create more robust AI systems that better resist jailbreak attempts while maintaining functionality.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Researchers propose Explanation-Guided Adversarial Training (EGAT), a framework that combines adversarial training with explainable AI to create more robust and interpretable deep neural networks. The method achieves 37% improvement in adversarial accuracy while producing semantically meaningful explanations with only 16% increase in training time.

AINeutralarXiv – CS AI · Mar 27/1022

🧠

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Researchers developed an offline-to-online reinforcement learning framework that improves robot control robustness through adversarial fine-tuning. The method trains policies on clean datasets then applies action perturbations during fine-tuning to build resilience against actuator faults and environmental uncertainties.

AIBullisharXiv – CS AI · Feb 276/105

🧠

To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

Researchers introduce AOT (Adversarial Opponent Training), a self-play framework that improves Multimodal Large Language Models' robustness by having an AI attacker generate adversarial image manipulations to train a defender model. The method addresses perceptual fragility in MLLMs when processing visually complex scenes, reducing hallucinations through dynamic adversarial training.

AINeutralarXiv – CS AI · May 95/10

🧠

Band Together: Untargeted Adversarial Training with Multimodal Coordination against Evasion-based Promotion Attacks

Researchers propose UAT-MC, a new defense mechanism for multimodal recommender systems that addresses cross-modal gradient misalignment in evasion-based promotion attacks. The approach synchronizes visual and textual perturbations through coordinated adversarial training, improving robustness while maintaining recommendation quality.

AINeutralarXiv – CS AI · Mar 264/10

🧠

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

Researchers propose a new method called 'perturbation' for understanding how language models learn representations by fine-tuning models on adversarial examples and measuring how changes spread to other examples. The approach reveals that trained language models develop structured linguistic abstractions without geometric assumptions, offering insights into how AI systems generalize language understanding.

AINeutralarXiv – CS AI · Mar 115/10

🧠

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

Researchers developed a new framework for training robust AI policies in partially observable environments where adversaries can manipulate hidden initial conditions. The study demonstrates improved robustness through targeted exposure to shifted latent distributions, reducing performance gaps in benchmark tests.

AINeutralarXiv – CS AI · Mar 44/102

🧠

Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme

Researchers introduce iJKOnet, a new method combining the JKO framework with inverse optimization to learn population dynamics from evolutionary snapshots. The approach uses adversarial training without restrictive architectural requirements and demonstrates improved performance over existing JKO-based methods.

AINeutralHugging Face Blog · Jul 163/108

🧠

How to train your model dynamically using adversarial data

The article title suggests content about dynamic model training using adversarial data techniques. However, the article body appears to be empty or unavailable, preventing detailed analysis of the methodology or implications.

AINeutralOpenAI News · May 251/106

🧠

Adversarial training methods for semi-supervised text classification

The article title references adversarial training methods for semi-supervised text classification, but no article body content was provided for analysis.