y0news
#agent-centric1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago2
๐Ÿง 

From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning

Researchers propose a dynamic agent-centric benchmarking system for evaluating large language models that replaces static datasets with autonomous agents that generate, validate, and solve problems iteratively. The protocol uses teacher, orchestrator, and student agents to create progressively challenging text anomaly detection tasks that expose reasoning errors missed by conventional benchmarks.