Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
Researchers propose Dynamic Boundary Evaluation (DBE), a new methodology for assessing large language models that adapts to each model's capability level rather than applying fixed benchmarks. The approach identifies performance boundaries where models achieve ~50% accuracy and calibrates them on a unified difficulty scale, addressing limitations in traditional evaluation that produce ceiling and floor effects masking true capability gaps.