y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reasoning-assessment News & Analysis

2 articles tagged with #reasoning-assessment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv – CS AI · 3d ago7/10
🧠

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Researchers introduce TRACE, a novel metric for evaluating the reasoning quality of large language models' Chain-of-Thought outputs by analyzing argument structure rather than just final answers. The method combines Toulmin's argumentation theory with metacognitive frameworks and demonstrates strong correlation with benchmark accuracy while improving reinforcement learning performance.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks

Researchers introduce CodeRQ-Bench, the first benchmark for evaluating LLM reasoning quality across coding tasks including generation, summarization, and classification. They propose VERA, a two-stage evaluator combining evidence-grounded verification with ambiguity-aware score correction, achieving significant performance improvements over existing methods.