y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#adversarial-training News & Analysis

22 articles tagged with #adversarial-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles
AIBullisharXiv – CS AI · May 127/10
🧠

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

Researchers propose Anchored Bipolicy Self-Play, a new safety training method that addresses fundamental limitations in parameter-shared self-play red teaming by using distinct LoRA adapters for attacker and defender roles. The approach achieves 100x greater parameter efficiency and improved safety robustness across multiple language model scales without sacrificing reasoning ability.

AIBullisharXiv – CS AI · Apr 157/10
🧠

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Researchers introduce Criticality-Aware Adversarial Training (CAAT), a parameter-efficient method that identifies and fine-tunes only the most robustness-critical parameters in Vision Transformers, achieving 94.3% of standard adversarial training robustness while tuning just 6% of model parameters. This breakthrough addresses the computational bottleneck preventing large-scale adversarial training deployment.

AIBullisharXiv – CS AI · Mar 177/10
🧠

ADV-0: Closed-Loop Min-Max Adversarial Training for Long-Tail Robustness in Autonomous Driving

ADV-0 is a new closed-loop adversarial training framework for autonomous driving that uses min-max optimization to improve robustness against rare but safety-critical scenarios. The system treats the interaction between driving policy and adversarial agents as a zero-sum game, converging to Nash Equilibrium while maximizing real-world performance bounds.

AINeutralarXiv – CS AI · Mar 177/10
🧠

Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations

Researchers introduced Eva-VLA, the first unified framework to systematically evaluate the robustness of Vision-Language-Action models for robotic manipulation under real-world physical variations. Testing revealed OpenVLA exhibits over 90% failure rates across three physical variations, exposing critical weaknesses in current VLA models when deployed outside laboratory conditions.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Researchers developed DMAST, a new training framework that protects multimodal web agents from cross-modal attacks where adversaries inject malicious content into webpages to deceive both visual and text processing channels. The method uses adversarial training through a three-stage pipeline and significantly outperforms existing defenses while doubling task completion efficiency.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Researchers introduce Adversarially-Aligned Jacobian Regularization (AAJR), a new method to improve the robustness of autonomous AI agent systems by controlling sensitivity along adversarial directions rather than globally. This approach maintains better performance while ensuring stability in multi-agent AI ecosystems compared to existing methods.

AIBullisharXiv – CS AI · Mar 37/103
🧠

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Researchers introduce GAR (Generative Adversarial Reinforcement Learning), a new AI training framework that jointly trains problem generators and solvers in an adversarial loop for formal theorem proving. The method shows significant improvements in mathematical proof capabilities, with models achieving 4.20% average relative improvement on benchmark tests.

AINeutralarXiv – CS AI · 4d ago6/10
🧠

Adversarial Training for Robust Coverage Network under Worst-case Facility Losses

Researchers propose a Dual-Agent Deep Reinforcement Learning framework to solve the Maximal Covering Location-Interdiction Problem, a computationally complex bi-level optimization challenge critical for resilient infrastructure planning. The adversarial training approach, where location and interdiction agents compete, achieves superior computational efficiency while maintaining competitive solution quality across synthetic and real-world datasets.

AIBullisharXiv – CS AI · May 96/10
🧠

Information Theoretic Adversarial Training of Large Language Models

Researchers propose WARDEN, an information-theoretic adversarial training framework that improves Large Language Model robustness against prompt attacks by dynamically reweighting adversarial examples using f-divergence principles. The method achieves comparable computational efficiency to existing approaches while substantially reducing attack success rates, advancing the scalability of AI safety mechanisms.

AIBullisharXiv – CS AI · Apr 106/10
🧠

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Researchers introduce PyFi, a framework enabling vision language models to understand financial images through progressive reasoning chains, backed by a 600K synthetic dataset organized as a reasoning pyramid. The approach uses adversarial agents to automatically generate training data without human annotation, achieving up to 19.52% accuracy improvements on fine-tuned models.

AIBullisharXiv – CS AI · Mar 266/10
🧠

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Researchers introduce Generative Adversarial Reasoner, a new training framework that improves LLM mathematical reasoning by using adversarial reinforcement learning between a reasoner and discriminator model. The method achieved significant performance gains on mathematical benchmarks, improving DeepSeek models by 7-10 percentage points on AIME24 tests.

🧠 Llama
AIBullisharXiv – CS AI · Mar 36/105
🧠

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Researchers introduce CEMMA, a co-evolutionary framework for improving AI safety alignment in multimodal large language models. The system uses evolving adversarial attacks and adaptive defenses to create more robust AI systems that better resist jailbreak attempts while maintaining functionality.

AIBullisharXiv – CS AI · Mar 36/103
🧠

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Researchers propose Explanation-Guided Adversarial Training (EGAT), a framework that combines adversarial training with explainable AI to create more robust and interpretable deep neural networks. The method achieves 37% improvement in adversarial accuracy while producing semantically meaningful explanations with only 16% increase in training time.

AIBullisharXiv – CS AI · Feb 276/105
🧠

To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

Researchers introduce AOT (Adversarial Opponent Training), a self-play framework that improves Multimodal Large Language Models' robustness by having an AI attacker generate adversarial image manipulations to train a defender model. The method addresses perceptual fragility in MLLMs when processing visually complex scenes, reducing hallucinations through dynamic adversarial training.

AINeutralarXiv – CS AI · Mar 264/10
🧠

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

Researchers propose a new method called 'perturbation' for understanding how language models learn representations by fine-tuning models on adversarial examples and measuring how changes spread to other examples. The approach reveals that trained language models develop structured linguistic abstractions without geometric assumptions, offering insights into how AI systems generalize language understanding.

AINeutralarXiv – CS AI · Mar 115/10
🧠

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

Researchers developed a new framework for training robust AI policies in partially observable environments where adversaries can manipulate hidden initial conditions. The study demonstrates improved robustness through targeted exposure to shifted latent distributions, reducing performance gaps in benchmark tests.

AINeutralarXiv – CS AI · Mar 44/102
🧠

Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme

Researchers introduce iJKOnet, a new method combining the JKO framework with inverse optimization to learn population dynamics from evolutionary snapshots. The approach uses adversarial training without restrictive architectural requirements and demonstrates improved performance over existing JKO-based methods.

AINeutralHugging Face Blog · Jul 163/108
🧠

How to train your model dynamically using adversarial data

The article title suggests content about dynamic model training using adversarial data techniques. However, the article body appears to be empty or unavailable, preventing detailed analysis of the methodology or implications.