AIBullisharXiv โ CS AI ยท 15h ago7/10
๐ง
Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering
Researchers have developed a new technique called activation steering to reduce reasoning biases in large language models, particularly the tendency to confuse content plausibility with logical validity. Their novel K-CAST method achieved up to 15% improvement in formal reasoning accuracy while maintaining robustness across different tasks and languages.