y0news
AnalyticsDigestsSourcesRSSAICrypto
#regulatory-tech1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 3d ago7/10
๐Ÿง 

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

Researchers have developed an open-source benchmark dataset to evaluate AI systems' compliance with the EU AI Act, specifically focusing on NLP and RAG systems. The dataset enables automated assessment of risk classification, article retrieval, and question-answering tasks, achieving 0.87 and 0.85 F1-scores for prohibited and high-risk scenarios.