AIBearisharXiv โ CS AI ยท 4h ago7/10
๐ง
AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance
Researchers have catalogued 195 AI safety benchmarks released since 2018, revealing that rapid proliferation of evaluation tools has outpaced standardization efforts. The study identifies critical fragmentation: inconsistent metric definitions, limited language coverage, poor repository maintenance, and lack of shared measurement standards across the field.
๐ข Hugging Face