MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
Researchers present MM-PoisonRAG, a framework demonstrating critical vulnerabilities in multimodal RAG systems where adversaries can inject poisoned content into knowledge bases to manipulate AI outputs. Two attack strategies—localized poisoning targeting specific queries and globalized poisoning affecting all queries—achieve high success rates and bypass existing defenses, exposing fundamental security gaps in RAG-augmented language models.
The emergence of retrieval-augmented generation in multimodal large language models represents a significant advancement in reducing AI hallucinations, yet this research exposes a critical architectural vulnerability: the poisoning of external knowledge sources. MM-PoisonRAG systematically demonstrates that defenders have underestimated adversarial capacity to corrupt RAG pipelines, with localized attacks achieving 56% success rates even under restricted access conditions and globalized attacks reducing accuracy to zero with minimal effort.
This work builds on growing concerns about AI system robustness as these models become embedded in critical applications. The transition from text-only to multimodal systems added complexity that security frameworks haven't adequately addressed. Prior research focused primarily on model robustness rather than supply-chain vulnerabilities in retrieval infrastructure, leaving a substantial gap that malicious actors can exploit.
The implications extend across multiple stakeholder groups. Developers deploying RAG systems face immediate pressure to architect more resilient retrieval mechanisms, while enterprises using MLLMs for decision-making must reconsider trust assumptions about their knowledge bases. The transferability of attacks across different retrievers without optimization suggests that defenses must operate at architectural rather than component-specific levels.
Future research will likely focus on poisoning-resistant retrieval mechanisms, adversarial content detection, and verification layers between retrieval and generation stages. The finding that existing defenses fail comprehensively indicates this remains an open problem requiring fundamental innovation rather than incremental hardening. Organizations should prioritize access controls on knowledge base modifications and implement detection systems for anomalous content patterns.
- →Localized poisoning attacks achieve up to 56% success rates in manipulating specific model outputs even with restricted attacker access.
- →Single globalized poisoning injections can reduce model accuracy to 0% across all queries, demonstrating catastrophic failure modes.
- →Attack success transfers effectively across four different retriever systems without requiring adversary re-optimization.
- →Existing defense mechanisms consistently fail against both poisoning strategies, indicating architectural rather than tactical vulnerabilities.
- →Knowledge base security represents a critical but underaddressed component of multimodal AI system robustness.