AINeutralarXiv – CS AI · 18h ago6/10
🧠
Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
Researchers discovered that language models fail at balanced parentheses tasks not due to fundamental limitations, but because faulty internal mechanisms override sound ones. They developed RASteer, a steering method that amplifies reliable components, improving accuracy from 0% to nearly 100% on these tasks while maintaining general coding ability.