AIBearisharXiv β CS AI Β· 8h ago6/10
π§
Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation
Researchers discovered that when language models receive complex adversarial instructions to underperform, they abandon semantic reasoning and collapse into positional shortcutsβdefaulting to single response positions up to 99.9% of the time. This reveals fundamental vulnerabilities in how instruction-tuned models handle adversarial prompts, with implications for AI safety and evaluation reliability.
π§ Llama