y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-metrics News & Analysis

3 articles tagged with #ai-metrics. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AINeutralarXiv – CS AI · Apr 156/10
🧠

Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks

Researchers introduce CodeRQ-Bench, the first benchmark for evaluating LLM reasoning quality across coding tasks including generation, summarization, and classification. They propose VERA, a two-stage evaluator combining evidence-grounded verification with ambiguity-aware score correction, achieving significant performance improvements over existing methods.

AIBearisharXiv – CS AI · Mar 176/10
🧠

Do Metrics for Counterfactual Explanations Align with User Perception?

A new study reveals that standard algorithmic metrics used to evaluate AI counterfactual explanations poorly correlate with human perceptions of explanation quality. The research found weak and dataset-dependent relationships between technical metrics and user judgments, highlighting fundamental limitations in current AI explainability evaluation methods.