y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#backdoor-detection News & Analysis

6 articles tagged with #backdoor-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AINeutralarXiv – CS AI · May 117/10
🧠

Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

Researchers demonstrate that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in detecting backdoor attacks in language models, achieving a 0.40 Backdoor Isolation Score with perfect precision. The study reveals that backdoors manifest as directional activation shifts rather than sparse features, providing critical insights for AI safety monitoring and interpretability tool development.

AIBullisharXiv – CS AI · Mar 127/10
🧠

Repurposing Backdoors for Good: Ephemeral Intrinsic Proofs for Verifiable Aggregation in Cross-silo Federated Learning

Researchers propose a novel lightweight architecture for verifiable aggregation in federated learning that uses backdoor injection as intrinsic proofs instead of expensive cryptographic methods. The approach achieves over 1000x speedup compared to traditional cryptographic baselines while maintaining high detection rates against malicious servers.

AINeutralarXiv – CS AI · May 126/10
🧠

Towards Backdoor-Based Ownership Verification for Vision-Language-Action Models

Researchers introduce GuardVLA, a backdoor-based watermarking framework designed to verify ownership of Vision-Language-Action models used in robotic control systems. The technique embeds hidden triggers during training that remain detectable after model release and adaptation, enabling creators to prove intellectual property rights without compromising model performance.

AINeutralarXiv – CS AI · May 76/10
🧠

Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection

Researchers introduce Coward, a novel proactive backdoor detection method for federated learning that uses collision-based watermarking to identify poisoned model updates from malicious clients. The approach addresses critical limitations in existing detection methods by leveraging multi-backdoor collision effects and regulated OOD data injection, achieving state-of-the-art performance with fewer false positives.

AINeutralarXiv – CS AI · Apr 136/10
🧠

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Researchers introduce CLIP-Inspector, a backdoor detection method for prompt-tuned CLIP models that reconstructs hidden triggers using out-of-distribution images to identify if a model has been maliciously compromised. The technique achieves 94% detection accuracy and enables post-hoc model repair, addressing critical security vulnerabilities in outsourced machine learning services.