y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#adaptive-testing News & Analysis

2 articles tagged with #adaptive-testing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv – CS AI · 3d ago7/10
🧠

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models

Researchers propose Dynamic Boundary Evaluation (DBE), a new methodology for assessing large language models that adapts to each model's capability level rather than applying fixed benchmarks. The approach identifies performance boundaries where models achieve ~50% accuracy and calibrates them on a unified difficulty scale, addressing limitations in traditional evaluation that produce ceiling and floor effects masking true capability gaps.

AINeutralarXiv – CS AI · Mar 266/10
🧠

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

Researchers developed DepthCharge, a new framework for measuring how deeply large language models can maintain accurate responses when questioned about domain-specific knowledge. Testing across four domains revealed significant variation in model performance depth, with no single AI model dominating all areas and expensive models not always achieving superior results.