#adversarial-security News & Analysis

5 articles tagged with #adversarial-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · May 277/10

🧠

Curriculum Learning for Safety Alignment

Researchers propose Staged-Competence, a curriculum learning framework that enhances Direct Preference Optimisation (DPO) for AI safety alignment. The method reduces out-of-distribution harmful responses by 16% and jailbreak success rates by 20% while maintaining model capabilities, achieving baseline safety with 25% less training data.

AIBullisharXiv – CS AI · May 127/10

🧠

Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

Researchers introduce PRAETORIAN, a novel defense mechanism against backdoor attacks on Graph Neural Networks that targets the fundamental requirements of effective attacks rather than surface-level indicators. The defense achieves a 99.45% reduction in attack success rates while maintaining minimal accuracy degradation, forcing adversaries into an unfavorable trade-off between attack effectiveness and detectability.

AIBullisharXiv – CS AI · May 97/10

🧠

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

DeTrigger is a new federated learning framework that uses gradient analysis to detect and neutralize backdoor attacks in distributed machine learning systems. The approach achieves 251x faster detection than existing methods while mitigating 98.9% of backdoor attacks with minimal accuracy loss, addressing a critical vulnerability in privacy-preserving collaborative AI training.

AIBearisharXiv – CS AI · May 77/10

🧠

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Researchers demonstrate that audio language models can be jailbroken using sparse token optimization rather than dense waveform updates, with Token-Aware Gradient Optimization (TAGO) achieving comparable attack success rates while modifying only 25% of audio tokens. The findings reveal that gradient energy concentrates in specific audio regions, suggesting future AI safety research should account for this heterogeneous token-level structure.

AIBearisharXiv – CS AI · May 46/10

🧠

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Researchers have developed BadSNN, a novel backdoor attack method targeting Spiking Neural Networks by exploiting hyperparameter variations in spiking neurons. The attack demonstrates superior performance compared to existing backdoor methods and shows resistance to current mitigation techniques, raising security concerns for SNNs used in edge computing and neuromorphic applications.