y0news
AnalyticsDigestsSourcesRSSAICrypto
#statistical-bias1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 7h ago6/10
๐Ÿง 

On the Step Length Confounding in LLM Reasoning Data Selection

Researchers identify a critical flaw in naturalness-based data selection methods for large language model reasoning datasets, where algorithms systematically favor longer reasoning steps rather than higher-quality reasoning. The study proposes two corrective methods (ASLEC-DROP and ASLEC-CASL) that successfully mitigate this 'step length confounding' bias across multiple LLM benchmarks.