y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-security News & Analysis

18 articles tagged with #model-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles
AIBearisharXiv – CS AI Β· 6d ago7/10
🧠

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

Researchers have identified SkillTrojan, a novel backdoor attack targeting skill-based agent systems by embedding malicious logic within reusable skills rather than model parameters. The attack leverages skill composition to execute attacker-defined payloads with up to 97.2% success rates while maintaining clean task performance, revealing critical security gaps in AI agent architectures.

🧠 GPT-5
AIBullisharXiv – CS AI Β· Mar 277/10
🧠

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Researchers developed Model2Kernel, a system that automatically detects memory safety bugs in CUDA kernels used for large language model inference. The system discovered 353 previously unknown bugs across popular platforms like vLLM and Hugging Face with only nine false positives.

🏒 Hugging Face
AINeutralarXiv – CS AI Β· Mar 277/10
🧠

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Researchers propose a unified framework for AI security threats that categorizes attacks based on four directional interactions between data and models. The comprehensive taxonomy addresses vulnerabilities in foundation models through four categories: data-to-data, data-to-model, model-to-data, and model-to-model attacks.

AINeutralarXiv – CS AI Β· Mar 177/10
🧠

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

Researchers developed UMID, a new text-only auditing framework to detect if personally identifiable information was memorized during training of multimodal AI models like CLIP and CLAP. The method significantly improves efficiency and effectiveness of membership inference attacks while maintaining privacy constraints.

AIBullisharXiv – CS AI Β· Mar 167/10
🧠

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Researchers discovered that privacy vulnerabilities in neural networks exist in only a small fraction of weights, but these same weights are critical for model performance. They developed a new approach that preserves privacy by rewinding and fine-tuning only these critical weights instead of retraining entire networks, maintaining utility while defending against membership inference attacks.

AINeutralarXiv – CS AI Β· 2d ago6/10
🧠

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

Researchers demonstrate that deliberative alignmentβ€”a method for improving LLM safety by distilling reasoning from stronger modelsβ€”still allows unsafe behaviors from base models to persist despite learning safer reasoning patterns. They propose a Best-of-N sampling technique that reduces attack success rates by 28-35% across multiple benchmarks while maintaining utility.

AINeutralarXiv – CS AI Β· 6d ago6/10
🧠

AdaProb: Efficient Machine Unlearning via Adaptive Probability

Researchers propose AdaProb, a machine unlearning method that enables trained AI models to efficiently forget specific data while preserving privacy and complying with regulations like GDPR. The approach uses adaptive probability distributions and demonstrates 20% improvement in forgetting effectiveness with 50% less computational overhead compared to existing methods.

AINeutralarXiv – CS AI Β· Mar 176/10
🧠

Protecting Deep Neural Network Intellectual Property with Chaos-Based White-Box Watermarking

Researchers have developed a new white-box watermarking framework that uses chaotic sequences to embed ownership information into deep neural network parameters for intellectual property protection. The method uses logistic maps and genetic algorithms to verify model ownership without degrading performance, showing effectiveness on MNIST and CIFAR-10 datasets.

AIBearisharXiv – CS AI Β· Mar 166/10
🧠

Prompt Injection as Role Confusion

Researchers have identified 'role confusion' as the fundamental mechanism behind prompt injection attacks on language models, where models assign authority based on how text is written rather than its source. The study achieved 60-61% attack success rates across multiple models and found that internal role confusion strongly predicts attack success before generation begins.

AIBearisharXiv – CS AI Β· Mar 37/106
🧠

Turning Black Box into White Box: Dataset Distillation Leaks

Researchers discovered that dataset distillation, a technique for compressing large datasets into smaller synthetic ones, has serious privacy vulnerabilities. The study introduces an Information Revelation Attack (IRA) that can extract sensitive information from synthetic datasets, including predicting the distillation algorithm, model architecture, and recovering original training samples.

AIBullisharXiv – CS AI Β· Mar 37/106
🧠

Token-level Data Selection for Safe LLM Fine-tuning

Researchers have developed TOSS, a new framework for safely fine-tuning large language models that operates at the token level rather than sample level. The method identifies and removes unsafe tokens while preserving task-specific information, demonstrating superior performance compared to existing sample-level defense methods in maintaining both safety and utility.

AIBullisharXiv – CS AI Β· Mar 27/1016
🧠

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Researchers have developed MPU, a privacy-preserving framework that enables machine unlearning for large language models without requiring servers to share parameters or clients to share data. The framework uses perturbed model copies and harmonic denoising to achieve comparable performance to non-private methods, with most algorithms showing less than 1% performance degradation.

AINeutralOpenAI News Β· Jan 225/105
🧠

Trading inference-time compute for adversarial robustness

The article discusses research on trading computational resources during inference time to improve adversarial robustness in AI systems. This approach explores how allocating more compute power at inference can enhance model security against adversarial attacks.

AINeutralOpenAI News Β· Sep 195/104
🧠

OpenAI Red Teaming Network

OpenAI has announced an open call for experts to join their Red Teaming Network, focusing on improving AI model safety. The initiative seeks domain experts to help identify vulnerabilities and enhance security measures for OpenAI's AI systems.

AINeutralHugging Face Blog Β· Apr 144/105
🧠

4M Models Scanned: Protect AI + Hugging Face 6 Months In

The article title suggests a 6-month collaboration between Protect AI and Hugging Face has resulted in scanning 4 million AI models. However, the article body appears to be empty, preventing detailed analysis of the partnership's findings or implications.

AINeutralOpenAI News Β· May 34/106
🧠

Transfer of adversarial robustness between perturbation types

The article discusses research on adversarial robustness transfer between different types of perturbations in machine learning models. This research examines how defensive techniques developed for one type of attack may provide protection against other types of adversarial examples.