Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation
Researchers demonstrate a novel poisoning attack on retrieval-augmented text-to-music systems where attackers inject malicious captions into music databases to manipulate generation outputs toward attacker-chosen targets while maintaining alignment with original user prompts. The attack reveals a critical integrity vulnerability in AI systems that depend on external knowledge bases for prompt augmentation.
This research exposes a fundamental architectural weakness in retrieval-augmented generative AI systems. Rather than attacking the generator or retriever directly, the researchers poison the underlying knowledge database with carefully crafted captions that survive retrieval while biasing downstream generation. The dual-layer strategy preserves high-level semantic anchors that keep retrieved captions plausible while injecting low-level acoustic descriptors that steer outputs toward malicious objectives. This represents a sophisticated supply-chain attack on creative AI infrastructure.
The vulnerability emerges from the growing adoption of retrieval-augmented systems across generative AI applications. These architectures gained prominence as a way to ground language models and improve generation quality by leveraging external knowledge, but this design introduces new attack surfaces. Unlike prompt injection attacks that modify user input, caption poisoning operates silently within system infrastructure, making detection significantly harder.
For the AI industry, this finding has substantial implications for production systems that depend on crowd-sourced or third-party datasets. Music generation companies, cloud providers offering TTM services, and enterprises deploying retrieval-augmented systems face increased security requirements around database integrity. The attack demonstrates that defensive measures must extend beyond model robustness to encompass data validation, dataset monitoring, and access controls on knowledge bases.
Organizations using similar architectures should implement cryptographic verification of database entries, implement anomaly detection on retrieved captions, and establish audit trails for database modifications. The research suggests that as generative AI systems become more complex and modular, security models must evolve to protect every component, including those previously considered static or low-risk infrastructure.
- βAttackers can poison music caption databases with crafted entries that bias text-to-music generation without modifying user prompts or the generator itself
- βThe dual-layer poisoning strategy preserves retrieval relevance while injecting hidden malicious descriptors, making poisoned captions appear legitimate
- βRetrieval-augmented AI systems introduce a critical integrity dependency on external knowledge bases that standard security models often overlook
- βDatabase poisoning attacks represent a new threat class for generative AI systems that are harder to detect than prompt injection or model attacks
- βDevelopers must implement data validation, dataset monitoring, and cryptographic verification to protect retrieval-augmented systems from integrity attacks