Cordyceps: Covert Control Attacks on LLMs via Data Poisoning
Researchers have identified a new data poisoning vulnerability in large language models called 'covert control attacks' that uses semantic associations to hide malicious instructions rather than obvious trigger phrases. This method successfully evades existing backdoor and prompt injection defenses, maintaining up to 98% attack success rates and outperforming traditional poisoning techniques by 40%.