y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-security News & Analysis

Recent coverage of #ai-security remains predominantly skeptical, with nearly half of articles in the past month taking a bearish stance. The 250 indexed articles reflect sustained concern about vulnerabilities and risks as artificial intelligence systems become more prevalent. Anthropic and its Claude model dominate discussions alongside emerging systems like GPT-5, while research from arXiv–CS AI forms the bulk of technical analysis. Sentiment has held relatively stable over the past 90 days, suggesting these security concerns represent ongoing rather than newly emerged challenges. Coverage frequently intersects with #cybersecurity, #machine-learning, #ai-safety, and #adversarial-attacks, indicating security issues span multiple technical domains. Browse the articles below to understand the specific threats and defensive approaches currently under scrutiny.

sentiment · last 30d (86 articles)
Top sources:arXiv – CS AI · 147Crypto Briefing · 10Blockonomi · 8Fortune Crypto · 7The Register – AI · 7
Most-discussed entities:Anthropic · 19Claude · 8GPT-5 · 7OpenAI · 6Llama · 4
330 articles
AI × CryptoBearishCoinTelegraph · Apr 15🔥 8/10
🤖

North Korean hackers used AI-enabled social engineering in Zerion attack

North Korean hackers executed a sophisticated attack on Zerion using AI-enabled social engineering tactics, marking the second major long-term social engineering campaign this month following the $280 million Drift Protocol exploit. The incident demonstrates how threat actors are leveraging artificial intelligence to enhance the effectiveness and scale of credential compromise attacks against cryptocurrency platforms.

North Korean hackers used AI-enabled social engineering in Zerion attack
AI × CryptoBearishCoinDesk · Apr 137/10
🤖

AI agents are set to power crypto payments, but a hidden flaw could expose wallets

Researchers have identified a critical vulnerability in AI infrastructure layers used for cryptocurrency payments, where intermediary systems can intercept sensitive wallet data. The flaw has reportedly enabled credential theft and at least one $500,000 wallet drain, exposing a significant security gap as AI agents become more integrated into crypto transaction systems.

AI agents are set to power crypto payments, but a hidden flaw could expose wallets
AIBearishFortune Crypto · Apr 10🔥 8/10
🧠

The AI that found 27-year-old vulnerabilities no human ever caught before just forced an emergency meeting with every major Wall Street CEO

Anthropic's latest AI model discovered 27-year-old security vulnerabilities that human researchers missed, prompting Treasury Secretary Scott Bessent and Fed Chair Jerome Powell to convene an emergency meeting with major Wall Street CEOs. The incident highlights critical gaps in legacy system security and raises questions about AI's expanding role in identifying financial infrastructure risks.

The AI that found 27-year-old vulnerabilities no human ever caught before just forced an emergency meeting with every major Wall Street CEO
🏢 Anthropic
AIBearishCoinDesk · Apr 107/10
🧠

Mythos AI threat prompts Bessent, Powell to convene bank CEOs for urgent talks

Treasury Secretary Bessent and Federal Reserve Chair Powell are convening bank CEOs for urgent discussions following concerns about Mythos, an AI system capable of rapidly identifying software vulnerabilities and developing sophisticated exploits. The meeting addresses fears that such AI capabilities could pose systemic risks to financial institutions and banking infrastructure.

Mythos AI threat prompts Bessent, Powell to convene bank CEOs for urgent talks
AIBearishDaily Hodl · 1d ago7/10
🧠

Pennsylvania Bank Issues Urgent Alert After AI Application Triggers Data Breach, Exposing Sensitive Customer Info

Community Bank, a Pennsylvania-based financial institution, disclosed a data breach caused by an AI application that exposed customer names, social security numbers, and dates of birth. The breach, reported to the SEC, highlights emerging cybersecurity vulnerabilities in AI-powered banking systems and raises concerns about enterprise AI security practices across the financial sector.

Pennsylvania Bank Issues Urgent Alert After AI Application Triggers Data Breach, Exposing Sensitive Customer Info
AIBearishDecrypt – AI · 1d ago7/10
🧠

What Is an AI Prompt Injection Attack? The Hidden Threat Hijacking Your Chatbots

Prompt injection attacks allow hackers to manipulate AI chatbots like ChatGPT, Claude, and Gemini through adversarial text inputs, potentially hijacking their behavior and outputs. OpenAI has indicated this vulnerability may be inherent to large language models and difficult to fully eliminate, raising significant security concerns for enterprises and individual users relying on these systems.

What Is an AI Prompt Injection Attack? The Hidden Threat Hijacking Your Chatbots
🏢 OpenAI🧠 ChatGPT🧠 Claude
AI × CryptoBearishFortune Crypto · 2d ago7/10
🤖

The AI arms race in cybersecurity has started. Most companies aren’t ready

An emerging AI arms race in cybersecurity has begun, with threat actors leveraging artificial intelligence for sophisticated attacks while most organizations lack adequate defensive measures. Coinbase's security leadership highlights the urgency for companies to adopt AI-powered security strategies to counter evolving threats.

The AI arms race in cybersecurity has started. Most companies aren’t ready
AIBearisharXiv – CS AI · 2d ago7/10
🧠

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

Researchers have developed a comprehensive taxonomy of jailbreak attacks and defenses for Large Audio Language Models (LALMs), identifying vulnerabilities across semantic, acoustic, signal, and embedding layers. The study reveals that current defenses create tradeoffs between robustness and usability, highlighting the need for cost-aware safety evaluation beyond simple success-rate metrics.

AIBearisharXiv – CS AI · 2d ago7/10
🧠

Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

Researchers demonstrate that LLM providers can systematically inflate token counts billed to users, with hidden reasoning tokens inflatable by up to 1,469% without detection. The core issue stems from a fundamental audit paradox: providers control both the tokenizer and execution, making verification impossible without independent verification mechanisms like trusted execution attestation or cryptographic proofs.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Researchers introduce AgentDoG 1.5, a lightweight AI safety framework designed to protect open-world agents like OpenClaw from emerging security risks. The framework uses only ~1k training samples to create efficient models (0.8B-8B parameters) that match closed-source alternatives while reducing deployment overhead by 100x, with all resources released openly.

🧠 GPT-5
AIBearisharXiv – CS AI · 2d ago7/10
🧠

Finding DoRI: Discovery of Retained Images in Diffusion Models

Researchers challenge the assumption that memorization in text-to-image diffusion models can be localized to specific weights, demonstrating that pruning efforts can be bypassed through minor text embedding perturbations. The study reveals memorization is distributed throughout embedding space, suggesting current mitigation strategies are fundamentally fragile and requiring new approaches to protect training data privacy.

AIBearisharXiv – CS AI · 2d ago7/10
🧠

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Researchers have established the first comprehensive evaluation framework for dataset watermarking in fine-tuned diffusion models, revealing significant vulnerabilities in existing protection methods. While current watermarking techniques show promise in universality and transmissibility, the study demonstrates practical watermark removal methods that can eliminate these protections without degrading model performance, exposing critical gaps in copyright and security safeguards.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

KYA: A Framework-Agnostic Trust Layer for Autonomous Systems with Verifiable Provenance and Hierarchical Policy Composition

KYA (Know Your Agents) is an open-source trust and governance framework for autonomous systems that enables verifiable authorization, policy compliance, and post-hoc auditability across multi-agent environments. The system demonstrates strong security performance, detecting 89% of adversarial attacks while maintaining sub-millisecond latency and supporting 15+ agent frameworks.

AI × CryptoNeutralarXiv – CS AI · 2d ago7/10
🤖

Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

Researchers introduced Agora, a multi-agent LLM framework designed to detect deep logic bugs in consensus protocols used by blockchains and distributed systems. The system discovered 15 previously unknown protocol-level bugs in major implementations (Raft, EPaxos, HotStuff, BullShark) that existing LLM approaches failed to identify, demonstrating the effectiveness of domain-aware collaborative AI for protocol verification.

AIBearishArs Technica – AI · 2d ago7/10
🧠

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

A developer embedded a prompt injection attack into the jqwik library that instructed AI coding agents to delete application output, highlighting vulnerabilities in AI-assisted development tools. The incident reveals how malicious actors can compromise open-source projects to target AI systems, creating risks for developers relying on autonomous coding agents.

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code
AINeutralarXiv – CS AI · 3d ago7/10
🧠

I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors

Researchers conducted a study with 47 participants to evaluate how humans detect synthetic speech, testing detection accuracy across authentic, fully synthetic, and partially synthetic utterances under various trust manipulation conditions. The findings reveal that humans perform poorly at detecting fully synthetic speech (below-chance levels) and that trust cues like instructional framing and provenance labeling do not significantly improve detection, though they influence detection behavior.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Technical Report: Exploring the Emerging Threats of the Agent Skill Ecosystem

Researchers identified 76 confirmed malicious AI agent skills across major marketplaces, with 13.4% of 3,984 analyzed skills containing critical security vulnerabilities. The findings highlight urgent risks as AI agents gain access to sensitive credentials and systems, with malicious payloads still publicly available on platforms like clawhub.ai.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

A research position paper argues the AI/ML community should abandon the "positive backdoor" terminology and instead rigorously evaluate trigger-activated hidden behaviors as "Secret Alignment." Researchers found that existing implementations show significant brittleness in security properties, particularly in confidentiality, integrity, and availability—revealing that protective claims lack standardized evaluation frameworks.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

Researchers have discovered SeedHijack, a supply-chain attack that compromises LLM watermarking schemes by hijacking the pseudo-random number generator (PRNG) used in watermark implementation. The attack amplifies watermark signals while remaining undetectable by current defense mechanisms, exposing a critical vulnerability in cryptographic content-provenance systems that assumed PRNG trustworthiness.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Researchers present MM-PoisonRAG, a framework demonstrating critical vulnerabilities in multimodal RAG systems where adversaries can inject poisoned content into knowledge bases to manipulate AI outputs. Two attack strategies—localized poisoning targeting specific queries and globalized poisoning affecting all queries—achieve high success rates and bypass existing defenses, exposing fundamental security gaps in RAG-augmented language models.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

Researchers introduce Deepfake-Eval-2024, a new benchmark dataset of real-world deepfakes collected from social media in 2024, revealing that state-of-the-art detection models experience dramatic performance drops of 45-50% compared to academic benchmarks. The findings underscore a critical gap between laboratory-validated deepfake detectors and their effectiveness against actual manipulated content in circulation.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

LLM Watermark Evasion via Bias Inversion

Researchers demonstrate a practical attack called Bias-Inversion Rewriting Attack (BIRA) that defeats LLM watermarking schemes with over 99% success rate while maintaining semantic quality. The findings expose fundamental vulnerabilities in current watermarking detection methods, which are widely considered essential for identifying AI-generated content.

AINeutralarXiv – CS AI · 4d ago7/10
🧠

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

Researchers demonstrate that chain-of-thought reasoning in large language models like DeepSeek-R1 fundamentally changes how refusal mechanisms operate, requiring multi-stage interventions rather than simple activation steering. Unlike traditional LLMs where refusal exists in a single directional subspace, reasoning models jointly encode refusal across both residual activations and reasoning chains, making them more robust to direct attacks but potentially vulnerable to CoT-level manipulations.

AIBearisharXiv – CS AI · 4d ago7/10
🧠

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

Researchers have discovered that safety mechanisms in large language models operate within an instability region where small input variations cause unpredictable refusal behaviors rather than consistent outputs. The Furina jailbreak attack exploits this vulnerability by using fragmented prompts to amplify uncertainty, outperforming existing attacks on safety benchmarks and highlighting a fundamental weakness in current AI safety defenses.

Page 1 of 14Next →