#machine-learning-security News & Analysis

29 articles tagged with #machine-learning-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

29 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Channel Location Constrains the Auditability of Subliminal Learning

Researchers demonstrate that the auditability of hidden trait transfer in machine learning depends critically on the communication channel through which the trait travels, not merely model size or architecture. Pre-training screens like coverage can detect transfer in initialization-dependent channels but fail against convergent vocabulary geometry in language models, requiring fundamentally different detection approaches.

AIBearisharXiv – CS AI · Jun 237/10

🧠

The Unseen Hand: Manipulating Model Fairness and SHAP with Targeted Identity Re-Association Attacks

Researchers have discovered a new class of attacks called Targeted Identity Re-Association (TIRA) that can manipulate machine learning fairness audits and SHAP explainability tools without leaving detectable traces. The attacks use probabilistic output manipulation techniques to mask the influence of protected features, demonstrating that critical AI accountability mechanisms are vulnerable to sophisticated gaming.

AIBearisharXiv – CS AI · Jun 47/10

🧠

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

Researchers introduce MaskForge, a black-box attack method that exploits structural vulnerabilities in diffusion-based large language models (dLLMs) by leveraging their native masking capabilities. The technique achieves 79.3% average success rates across five models and transfers effectively to other benchmarks, demonstrating a significant security gap in an emerging class of language models distinct from standard autoregressive architectures.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

Researchers have discovered a critical vulnerability called Erasure Evasion Backdoor (EEB) that allows adversaries to bypass concept erasure methods in text-to-image diffusion models by binding malicious triggers to concepts marked for removal. The backdoor survives the erasure process across six state-of-the-art methods, achieving up to 94% success rates in exposing harmful content, revealing fundamental weaknesses in current AI safety approaches.

AIBullisharXiv – CS AI · May 287/10

🧠

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Researchers propose the Adversarial Prompt Disentanglement (APD) framework, a defense mechanism that identifies and neutralizes malicious components in LLM inputs before processing. The system combines semantic decomposition, graph-based intent classification, and transformer-based detection to reduce harmful outputs by over 85% while maintaining model performance.

AIBullisharXiv – CS AI · May 97/10

🧠

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

DeTrigger is a new federated learning framework that uses gradient analysis to detect and neutralize backdoor attacks in distributed machine learning systems. The approach achieves 251x faster detection than existing methods while mitigating 98.9% of backdoor attacks with minimal accuracy loss, addressing a critical vulnerability in privacy-preserving collaborative AI training.

AIBearisharXiv – CS AI · May 77/10

🧠

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

Researchers demonstrate that the shuffling defense mechanism used to protect Transformer model weights during secure inference can be broken through an alignment attack, allowing adversaries to recover weights with minimal cost. The attack exploits multiple shuffled activations by finding a common permutation, undermining a key security assumption in privacy-preserving machine learning.

AIBearisharXiv – CS AI · Apr 207/10

🧠

Power to the Clients: Federated Learning in a Dictatorship Setting

Researchers identify a critical vulnerability in federated learning systems where malicious 'dictator clients' can erase other participants' contributions while preserving their own, compromising the collaborative training process. The study provides theoretical and empirical analysis of single and multiple dictator scenarios, revealing fundamental security weaknesses in decentralized machine learning architectures.

AIBearisharXiv – CS AI · Apr 137/10

🧠

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Researchers have developed XFED, a novel model poisoning attack that compromises federated learning systems without requiring attackers to communicate or coordinate with each other. The attack successfully bypasses eight state-of-the-art defenses, revealing fundamental security vulnerabilities in FL deployments that were previously underestimated.

AIBullisharXiv – CS AI · Apr 77/10

🧠

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Researchers have developed CoopGuard, a new defense framework that uses cooperative AI agents to protect Large Language Models from sophisticated multi-round adversarial attacks. The system employs three specialized agents coordinated by a central system that maintains defense state across interactions, achieving a 78.9% reduction in attack success rates compared to existing defenses.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Reliability-Guided Adaptive Ensembling for Robust Test-Time Adaptation

Researchers propose SAFER, a training-free framework that enhances the robustness of test-time adaptation (TTA) methods against adversarial attacks on contaminated data streams. The method uses stochastic augmentation and reliability-guided prediction pooling to maintain performance while mitigating domain shift without requiring source data access.

AINeutralarXiv – CS AI · Jun 236/10

🧠

From CVE to CWE: Syscall-Based HIDS Generalisation

Researchers empirically test whether host intrusion detection systems trained on syscall traces can generalize across different CVE exploits within the same Common Weakness Enumeration class. Results show CWE-level generalization works for some weakness families (achieving F1=0.6976 for authentication flaws) but fails for others, with cross-CVE transfer heavily dependent on source profile breadth rather than weakness classification.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

Researchers propose methods to attack and defend continuous data summarization systems by exploiting vulnerabilities in similarity-based perturbations through DR-submodular optimization. The work demonstrates that adversarial attacks on upstream data processing can compromise trustworthy AI pipelines and proposes defense mechanisms with theoretical guarantees.

AINeutralarXiv – CS AI · Jun 95/10

🧠

SHIELD-IDS: Structurally Heterogeneous Ensemble with Integrated Layered Defense for Intrusion Detection Systems

Researchers introduce IDS-Anta++, an enhanced machine learning framework that defends intrusion detection systems against adversarial attacks through ensemble learning and multi-layer defensive mechanisms. The system achieves over 99% detection accuracy on clean data while demonstrating improved robustness against sophisticated attacks like FGSM and ZOO on standard cybersecurity datasets.

AINeutralarXiv – CS AI · Jun 96/10

🧠

CausShield: Sample Reconstruction-Resilient Vertical FL via Causal Representation Learning

CausShield is a new defense mechanism for vertical federated learning that uses causal representation learning to protect against sample reconstruction attacks while maintaining model performance. The approach decomposes shared representations into task-relevant and task-irrelevant components, achieving better privacy-utility tradeoffs than existing defenses through unsupervised learning rather than supervised training.

AIBullisharXiv – CS AI · Jun 46/10

🧠

TITAN-FedAnil+: Trust-Based Adaptive Blockchain Federated Learning for Resource-Constrained Intelligent Enterprises

TITAN-FedAnil+ presents a blockchain-based federated learning framework designed to address data privacy and security challenges in resource-constrained enterprise environments. The system uses adaptive clustering and GPU acceleration to filter malicious updates while reducing memory overhead by up to 81%, making secure distributed learning more practical for edge devices.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Token Rankings are Unforgeable Language Model Signatures

Researchers demonstrate that token ranking signatures from language model APIs are mathematically unforgeable—each model produces unique top-k token orderings that cannot be replicated by other models. While rankings leak less information than raw logits, they still enable approximate parameter theft, though APIs can mitigate this risk by restricting k to sufficiently small values.

AINeutralarXiv – CS AI · Jun 26/10

🧠

SORA: Free Second-Order Attacks in Fast Adversarial Training

Researchers introduce SORA, a new adversarial training method that addresses catastrophic overfitting in fast neural network defense systems. By leveraging perturbation variability and a novel gradient alignment metric, SORA achieves state-of-the-art robustness against adversarial attacks while maintaining higher clean accuracy with improved computational efficiency.

AINeutralarXiv – CS AI · Jun 26/10

🧠

GJDNet: Robust Graph Neural Networks via Joint Disentangled Learning Against Adversarial Attacks

Researchers propose GJDNet, a robust Graph Neural Network defense framework that protects against adversarial attacks by jointly disentangling node representations and decision spaces. The approach addresses vulnerabilities in GNNs caused by adversarial perturbations that invert graph connectivity patterns, achieving improved robustness across different graph types.

AINeutralarXiv – CS AI · May 296/10

🧠

Quantum-Enhanced Adversarial Robustness in Artificial Intelligence

Researchers present a comprehensive framework exploring how quantum computing techniques can enhance artificial intelligence's resilience against adversarial attacks. The work addresses a critical vulnerability in modern AI systems—their susceptibility to carefully crafted perturbations—by proposing quantum-enhanced defense mechanisms through optimization, feature mapping, and hybrid architectures.

AINeutralarXiv – CS AI · May 286/10

🧠

Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy

Researchers introduce mixture mechanisms for differential privacy that combine multiple Gaussian distributions to reduce noise in data queries while maintaining privacy guarantees. These mechanisms substantially outperform existing analytic Gaussian approaches in low-privacy regimes, approaching theoretical optimality with significantly lower noise amplitudes and variances.

AINeutralarXiv – CS AI · May 276/10

🧠

Practical Anonymous Two-Party Gradient Boosting Decision Tree

Researchers introduce an anonymous gradient-boosted decision tree (GBDT) protocol enabling secure training on vertically partitioned data between two parties while hiding record identifiers. The approach uses dual circuit-PSI and oblivious pseudorandom functions to eliminate ID exposure risks inherent in standard private set intersection methods, while achieving computational efficiency comparable to non-private approaches.

AINeutralarXiv – CS AI · May 276/10

🧠

Assessing Per-Sample Membership Inference Vulnerability without Retraining

Researchers propose a novel method to assess individual training data vulnerability to membership inference attacks without requiring shadow models. The approach combines theoretical analysis in linear settings with a practical surrogate score for deep networks, using only geometry and loss information from a single trained model.

AIBullisharXiv – CS AI · May 126/10

🧠

A Robust Out-of-Distribution Detection Framework via Synergistic Smoothing

Researchers introduce ROSS, a robust out-of-distribution detection framework that combines median smoothing with instability quantification to defend machine learning systems against adversarial attacks. The method achieves state-of-the-art performance by leveraging the observation that OOD samples exhibit higher instability under perturbations, outperforming prior defenses by up to 40 AUROC points.

AINeutralarXiv – CS AI · May 116/10

🧠

A Statistical Framework for Algorithmic Collective Action with Multiple Collectives

Researchers propose the first statistical framework for Algorithmic Collective Action (ACA) involving multiple independent collectives attempting to coordinate changes in shared data to influence AI classifier behavior. The framework provides computable bounds on collective success while accounting for varying sizes, strategies, and goal alignment across groups, with applications to climate adaptation in smart cities.

Page 1 of 2Next →