y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reliable-change-index News & Analysis

1 article tagged with #reliable-change-index. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv โ€“ CS AI ยท 8h ago6/10
๐Ÿง 

Beyond the Mean: Within-Model Reliable Change Detection for LLM Evaluation

Researchers adapted clinical psychology's Reliable Change Index to evaluate LLM performance across model versions, revealing that aggregate accuracy gains mask substantial item-level volatility. Testing Llama 3โ†’3.1 and Qwen 2.5โ†’3 showed bidirectional changes with large effect sizes, where improvements in low-accuracy domains offset deteriorations in high-accuracy ones, suggesting current evaluation methods underestimate model instability.

๐Ÿง  Llama