AIBearisharXiv – CS AI · 6d ago7/10
🧠Researchers introduce MaskForge, a black-box attack method that exploits structural vulnerabilities in diffusion-based large language models (dLLMs) by leveraging their native masking capabilities. The technique achieves 79.3% average success rates across five models and transfers effectively to other benchmarks, demonstrating a significant security gap in an emerging class of language models distinct from standard autoregressive architectures.
AIBearisharXiv – CS AI · Jun 27/10
🧠Researchers have discovered a critical vulnerability called Erasure Evasion Backdoor (EEB) that allows adversaries to bypass concept erasure methods in text-to-image diffusion models by binding malicious triggers to concepts marked for removal. The backdoor survives the erasure process across six state-of-the-art methods, achieving up to 94% success rates in exposing harmful content, revealing fundamental weaknesses in current AI safety approaches.
AIBullisharXiv – CS AI · May 287/10
🧠Researchers propose the Adversarial Prompt Disentanglement (APD) framework, a defense mechanism that identifies and neutralizes malicious components in LLM inputs before processing. The system combines semantic decomposition, graph-based intent classification, and transformer-based detection to reduce harmful outputs by over 85% while maintaining model performance.
AIBullisharXiv – CS AI · May 97/10
🧠DeTrigger is a new federated learning framework that uses gradient analysis to detect and neutralize backdoor attacks in distributed machine learning systems. The approach achieves 251x faster detection than existing methods while mitigating 98.9% of backdoor attacks with minimal accuracy loss, addressing a critical vulnerability in privacy-preserving collaborative AI training.
AIBearisharXiv – CS AI · May 77/10
🧠Researchers demonstrate that the shuffling defense mechanism used to protect Transformer model weights during secure inference can be broken through an alignment attack, allowing adversaries to recover weights with minimal cost. The attack exploits multiple shuffled activations by finding a common permutation, undermining a key security assumption in privacy-preserving machine learning.
AIBearisharXiv – CS AI · Apr 207/10
🧠Researchers identify a critical vulnerability in federated learning systems where malicious 'dictator clients' can erase other participants' contributions while preserving their own, compromising the collaborative training process. The study provides theoretical and empirical analysis of single and multiple dictator scenarios, revealing fundamental security weaknesses in decentralized machine learning architectures.
AIBearisharXiv – CS AI · Apr 137/10
🧠Researchers have developed XFED, a novel model poisoning attack that compromises federated learning systems without requiring attackers to communicate or coordinate with each other. The attack successfully bypasses eight state-of-the-art defenses, revealing fundamental security vulnerabilities in FL deployments that were previously underestimated.
AIBullisharXiv – CS AI · Apr 77/10
🧠Researchers have developed CoopGuard, a new defense framework that uses cooperative AI agents to protect Large Language Models from sophisticated multi-round adversarial attacks. The system employs three specialized agents coordinated by a central system that maintains defense state across interactions, achieving a 78.9% reduction in attack success rates compared to existing defenses.
AIBullisharXiv – CS AI · 6d ago6/10
🧠TITAN-FedAnil+ presents a blockchain-based federated learning framework designed to address data privacy and security challenges in resource-constrained enterprise environments. The system uses adaptive clustering and GPU acceleration to filter malicious updates while reducing memory overhead by up to 81%, making secure distributed learning more practical for edge devices.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers demonstrate that token ranking signatures from language model APIs are mathematically unforgeable—each model produces unique top-k token orderings that cannot be replicated by other models. While rankings leak less information than raw logits, they still enable approximate parameter theft, though APIs can mitigate this risk by restricting k to sufficiently small values.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers introduce SORA, a new adversarial training method that addresses catastrophic overfitting in fast neural network defense systems. By leveraging perturbation variability and a novel gradient alignment metric, SORA achieves state-of-the-art robustness against adversarial attacks while maintaining higher clean accuracy with improved computational efficiency.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers propose GJDNet, a robust Graph Neural Network defense framework that protects against adversarial attacks by jointly disentangling node representations and decision spaces. The approach addresses vulnerabilities in GNNs caused by adversarial perturbations that invert graph connectivity patterns, achieving improved robustness across different graph types.
AINeutralarXiv – CS AI · May 296/10
🧠Researchers present a comprehensive framework exploring how quantum computing techniques can enhance artificial intelligence's resilience against adversarial attacks. The work addresses a critical vulnerability in modern AI systems—their susceptibility to carefully crafted perturbations—by proposing quantum-enhanced defense mechanisms through optimization, feature mapping, and hybrid architectures.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers introduce mixture mechanisms for differential privacy that combine multiple Gaussian distributions to reduce noise in data queries while maintaining privacy guarantees. These mechanisms substantially outperform existing analytic Gaussian approaches in low-privacy regimes, approaching theoretical optimality with significantly lower noise amplitudes and variances.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers introduce an anonymous gradient-boosted decision tree (GBDT) protocol enabling secure training on vertically partitioned data between two parties while hiding record identifiers. The approach uses dual circuit-PSI and oblivious pseudorandom functions to eliminate ID exposure risks inherent in standard private set intersection methods, while achieving computational efficiency comparable to non-private approaches.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers propose a novel method to assess individual training data vulnerability to membership inference attacks without requiring shadow models. The approach combines theoretical analysis in linear settings with a practical surrogate score for deep networks, using only geometry and loss information from a single trained model.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce ROSS, a robust out-of-distribution detection framework that combines median smoothing with instability quantification to defend machine learning systems against adversarial attacks. The method achieves state-of-the-art performance by leveraging the observation that OOD samples exhibit higher instability under perturbations, outperforming prior defenses by up to 40 AUROC points.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose the first statistical framework for Algorithmic Collective Action (ACA) involving multiple independent collectives attempting to coordinate changes in shared data to influence AI classifier behavior. The framework provides computable bounds on collective success while accounting for varying sizes, strategies, and goal alignment across groups, with applications to climate adaptation in smart cities.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present the first theoretical framework for differentially private reinforcement learning with general function approximation, achieving regret bounds of Õ(K^3/5) that match linear-case performance. This breakthrough extends privacy guarantees beyond tabular and linear settings, combining batched policy updates with the exponential mechanism for improved privacy-utility tradeoffs in online RL systems.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers propose AdaBFL, a Byzantine-robust federated learning method that uses adaptive multi-layer defense mechanisms to protect distributed machine learning systems from poisoning attacks by malicious clients. The approach balances defense against multiple attack types without requiring server-side dataset access, with proven convergence properties on non-IID data.
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers introduce QShield, a hybrid quantum-classical neural network architecture that combines traditional CNNs with quantum processing modules to defend deep learning models against adversarial attacks. Testing on MNIST, OrganAMNIST, and CIFAR-10 datasets shows the hybrid approach maintains accuracy while substantially reducing attack success rates and increasing computational costs for adversaries.
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers introduce CLIP-Inspector, a backdoor detection method for prompt-tuned CLIP models that reconstructs hidden triggers using out-of-distribution images to identify if a model has been maliciously compromised. The technique achieves 94% detection accuracy and enables post-hoc model repair, addressing critical security vulnerabilities in outsourced machine learning services.