y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#item-response-theory News & Analysis

3 articles tagged with #item-response-theory. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AINeutralarXiv – CS AI · May 97/10
🧠

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models

Researchers propose Dynamic Boundary Evaluation (DBE), a new methodology for assessing large language models that adapts to each model's capability level rather than applying fixed benchmarks. The approach identifies performance boundaries where models achieve ~50% accuracy and calibrates them on a unified difficulty scale, addressing limitations in traditional evaluation that produce ceiling and floor effects masking true capability gaps.

AINeutralarXiv – CS AI · Apr 157/10
🧠

Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities

Researchers propose a cognitive diagnostic framework that evaluates large language models across fine-grained ability dimensions rather than aggregate scores, enabling targeted model improvement and task-specific selection. The approach uses multidimensional Item Response Theory to estimate abilities across 35 dimensions for mathematics and generalizes to physics, chemistry, and computer science with strong predictive accuracy.

AINeutralarXiv – CS AI · May 116/10
🧠

An Interpretable and Scalable Framework for Evaluating Large Language Models

Researchers introduce a scalable framework for evaluating large language models using Item Response Theory and majorization-minimization algorithms, achieving orders-of-magnitude speedups while improving interpretability. The method addresses computational limitations of traditional benchmarking approaches and provides insights into model abilities and benchmark item characteristics.