y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-security News & Analysis

216 articles tagged with #ai-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

216 articles
AIBullisharXiv – CS AI · Mar 37/107
🧠

ROKA: Robust Knowledge Unlearning against Adversaries

Researchers introduce ROKA, a new machine unlearning method that prevents knowledge contamination and indirect attacks on AI models. The approach uses 'Neural Healing' to preserve important knowledge while forgetting targeted data, providing theoretical guarantees for knowledge preservation during unlearning.

AIBearisharXiv – CS AI · Mar 36/108
🧠

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents

Researchers identified widespread TOCTOU (time of check to time of use) vulnerabilities in browser-use agents, where web pages change between planning and execution phases, potentially causing unintended actions. A study of 10 popular open-source agents revealed these security flaws are common, prompting development of a lightweight mitigation strategy based on pre-execution validation.

AIBearisharXiv – CS AI · Mar 37/107
🧠

CaptionFool: Universal Image Captioning Model Attacks

Researchers have developed CaptionFool, a universal adversarial attack that can manipulate AI image captioning models by modifying just 1.2% of image patches. The attack achieves 94-96% success rates in forcing models to generate arbitrary captions, including offensive content that can bypass content moderation systems.

AIBearisharXiv – CS AI · Mar 37/108
🧠

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Researchers have developed MIDAS, a new jailbreaking framework that successfully bypasses safety mechanisms in Multimodal Large Language Models by dispersing harmful content across multiple images. The technique achieved an 81.46% average attack success rate against four closed-source MLLMs by extending reasoning chains and reducing security attention.

$LINK
AIBearisharXiv – CS AI · Mar 36/107
🧠

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

Researchers have developed HIDE&SEEK (HS), a new attack method that can effectively remove watermarks from machine-generated images while maintaining visual quality. This research exposes vulnerabilities in current state-of-the-art proactive image watermarking defenses, highlighting the ongoing arms race between watermarking protection and removal techniques.

AIBullisharXiv – CS AI · Mar 37/106
🧠

Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)

Researchers have developed AloePri, the first privacy-preserving LLM inference method designed for industrial applications. The system uses collaborative obfuscation to protect input/output data while maintaining 96.5-100% accuracy and resisting state-of-the-art attacks, successfully tested on a 671B parameter model.

AIBearisharXiv – CS AI · Mar 37/108
🧠

Extracting Training Dialogue Data from Large Language Model based Task Bots

Researchers have identified significant privacy risks in Large Language Model-based Task-Oriented Dialogue Systems, demonstrating that these AI systems can memorize and leak sensitive training data including phone numbers and complete dialogue exchanges. The study proposes new attack methods that can extract thousands of training dialogue states with over 70% precision in best-case scenarios.

$RNDR
AINeutralarXiv – CS AI · Mar 36/104
🧠

Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment

Researchers have developed AQUA, the first watermarking framework designed to protect image copyright in Multimodal Retrieval-Augmented Generation (RAG) systems. The framework addresses a critical gap in protecting visual content within RAG-as-a-Service platforms by embedding semantic signals into synthetic images that survive the retrieval-to-generation process.

AI × CryptoBearishCoinTelegraph – AI · Mar 37/107
🤖

OpenZeppelin finds data contamination in OpenAI’s EVMbench

OpenZeppelin discovered significant flaws in OpenAI's EVMbench dataset, including data contamination from training leaks and at least four incorrectly classified high-severity vulnerabilities. This finding raises concerns about the reliability of AI tools used for blockchain security auditing.

OpenZeppelin finds data contamination in OpenAI’s EVMbench
CryptoNeutralNewsBTC · Mar 26/104
⛓️

Crypto’s Quietest Month In Nearly A Year — But Hackers Haven’t Gone Away

February 2026 saw crypto hack losses drop to just $26.5 million across 15 incidents, representing a 69% decline from January and the lowest monthly figure in 11 months. Two major attacks on YieldBlox ($10M) and IoTeX ($9M) accounted for over 70% of total losses, while improved security standards and AI-powered monitoring tools are helping reduce vulnerabilities.

Crypto’s Quietest Month In Nearly A Year — But Hackers Haven’t Gone Away
$BTC$XRP
AIBullisharXiv – CS AI · Mar 26/1012
🧠

Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning

Researchers developed Hybrid Class-Aware Selective Replay (Hybrid-CASR), a continual learning method that improves AI-based software vulnerability detection by addressing catastrophic forgetting in temporal scenarios. The method achieved 0.667 Macro-F1 score while reducing training time by 17% compared to baseline approaches on CVE data from 2018-2024.

AINeutralarXiv – CS AI · Mar 26/1014
🧠

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Researchers introduce Jailbreak Foundry (JBF), a system that automatically converts AI jailbreak research papers into executable code modules for standardized testing. The system successfully reproduced 30 attacks with high accuracy and reduces implementation code by nearly half while enabling consistent evaluation across multiple AI models.

AINeutralarXiv – CS AI · Mar 27/1010
🧠

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Researchers introduce Veritas, a multi-modal large language model designed for deepfake detection that uses pattern-aware reasoning to mimic human forensic processes. The system addresses real-world challenges through the HydraFake dataset and achieves significant improvements in detecting unseen forgeries across different domains.

AIBearishOpenAI News · Feb 256/106
🧠

Disrupting malicious uses of AI | February 2026

A new threat report analyzes how malicious actors are combining AI models with websites and social platforms to carry out attacks. The report examines the implications of these AI-powered threats for detection and defense systems.

AINeutralOpenAI News · Jan 286/105
🧠

Keeping your data safe when an AI agent clicks a link

OpenAI has implemented safeguards to protect user data when AI agents interact with external links, addressing potential security vulnerabilities. The measures focus on preventing URL-based data exfiltration and prompt injection attacks that could compromise user information.

$LINK
AIBearishIEEE Spectrum – AI · Jan 216/105
🧠

Why AI Keeps Falling for Prompt Injection Attacks

Large language models (LLMs) remain highly vulnerable to prompt injection attacks where specific phrasing can override safety guardrails, causing AI systems to perform forbidden actions or reveal sensitive information. Unlike humans who use contextual judgment and layered defenses, current LLMs lack the ability to assess situational appropriateness and cannot universally prevent such attacks.

AIBearishOpenAI News · Nov 266/104
🧠

Mixpanel security incident: what OpenAI users need to know

OpenAI disclosed a security incident involving their analytics partner Mixpanel that exposed limited API analytics data. The company confirmed that no API content, user credentials, or payment information was compromised in the breach.

AIBullishHugging Face Blog · Oct 226/105
🧠

Hugging Face and VirusTotal collaborate to strengthen AI security

Hugging Face has partnered with VirusTotal to enhance AI model security by integrating malware scanning capabilities. This collaboration aims to protect the AI ecosystem from malicious models and strengthen security protocols across AI platforms.

AIBullishGoogle DeepMind Blog · May 206/105
🧠

Advancing Gemini's security safeguards

Google has announced that Gemini 2.5 is their most secure AI model family to date, highlighting enhanced security safeguards. The announcement suggests continued improvements in AI safety and security measures for their flagship language model.

AINeutralOpenAI News · Mar 266/107
🧠

Security on the path to AGI

OpenAI is implementing comprehensive security measures directly into their infrastructure and models as they progress toward artificial general intelligence (AGI). The company emphasizes proactive adaptation to address security challenges on the path to AGI development.

AINeutralOpenAI News · Jan 225/105
🧠

Trading inference-time compute for adversarial robustness

The article discusses research on trading computational resources during inference time to improve adversarial robustness in AI systems. This approach explores how allocating more compute power at inference can enhance model security against adversarial attacks.

AINeutralOpenAI News · Nov 215/102
🧠

Advancing red teaming with people and AI

The article discusses advancements in red teaming methodologies that combine human expertise with artificial intelligence capabilities. This represents a significant development in cybersecurity practices and AI safety testing approaches.

AINeutralOpenAI News · May 286/105
🧠

OpenAI Board Forms Safety and Security Committee

OpenAI has established a new Safety and Security Committee as part of its board structure. This move comes as the AI company continues to scale its operations and address growing concerns about AI safety and security governance.

← PrevPage 8 of 9Next →