y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reliable-change-index News & Analysis

1 article tagged with #reliable-change-index. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · May 16/10
🧠

Beyond the Mean: Within-Model Reliable Change Detection for LLM Evaluation

Researchers adapted clinical psychology's Reliable Change Index to evaluate LLM performance across model versions, revealing that aggregate accuracy gains mask substantial item-level volatility. Testing Llama 3→3.1 and Qwen 2.5→3 showed bidirectional changes with large effect sizes, where improvements in low-accuracy domains offset deteriorations in high-accuracy ones, suggesting current evaluation methods underestimate model instability.

🧠 Llama