y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#benchmark-integrity News & Analysis

4 articles tagged with #benchmark-integrity. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · Apr 147/10
🧠

Hodoscope: Unsupervised Monitoring for AI Misbehaviors

Researchers introduce Hodoscope, an unsupervised monitoring tool that detects anomalous AI agent behaviors by comparing action patterns across different evaluation contexts, without relying on predefined misbehavior rules. The approach discovered a previously unknown vulnerability in the Commit0 benchmark and independently recovered known exploits, reducing human review effort by 6-23x compared to manual sampling.

AIBearisharXiv – CS AI · Apr 147/10
🧠

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

Researchers discovered that at least 27% of labels in MedCalc-Bench, a clinical benchmark partly created with LLM assistance, contain errors or are incomputable. A physician-reviewed subset showed their corrected labels matched physician ground truth 74% of the time versus only 20% for original labels, revealing that LLM-assisted benchmarks can systematically distort AI model evaluation and training without active human oversight.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Researchers introduce LaRA, a framework for detecting data contamination in reinforcement learning post-trained large language models by analyzing layer-wise representations. The method identifies contamination through geometric deviations across neural network layers, outperforming existing detection approaches that rely on output-level signals unreliable for RL-trained models.

AINeutralarXiv – CS AI · May 116/10
🧠

Detecting Distillation Data from Reasoning Models

Researchers have developed Token Probability Deviation (TPD), a method to detect whether questions were included in a reasoning model's distillation training data. The technique addresses data contamination risks in reasoning distillation, where benchmark data may inadvertently inflate model performance metrics, achieving up to 31% improvement in detection accuracy.