#benchmark-optimization News & Analysis

2 articles tagged with #benchmark-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

Item Response Scaling Laws: A Measurement Theory Approach for Efficient and Generalizable Neural Scaling Estimation

Researchers introduce Item Response Scaling Laws (IRSL), a framework that dramatically reduces computational costs for estimating language model performance by decomposing the problem into model ability and question difficulty components. The approach achieves 99.9% reduction in required evaluation samples while maintaining or exceeding accuracy of traditional scaling law methods.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Post-training is (Massive) Supervised Learning

A new arXiv paper argues that current LLM post-training methods (SFT and RL) function primarily as distribution-fitting mechanisms rather than developing general capabilities, reverting to pre-BERT era approaches. The authors demonstrate that randomly initialized models achieve non-trivial performance when fine-tuned on modern benchmarks, suggesting the field should shift toward training systems that learn how to learn rather than optimizing for specific tasks.