AIBearisharXiv โ CS AI ยท 7h ago6/10
๐ง
Prompt Injection as Role Confusion
Researchers have identified 'role confusion' as the fundamental mechanism behind prompt injection attacks on language models, where models assign authority based on how text is written rather than its source. The study achieved 60-61% attack success rates across multiple models and found that internal role confusion strongly predicts attack success before generation begins.