y0news
AnalyticsDigestsRSSAICrypto
#reasoning-capabilities2 articles
2 articles
AINeutralarXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

Researchers introduce CFE-Bench, a new multimodal benchmark for evaluating AI reasoning across 20+ STEM domains using authentic university exam problems. The best performing model, Gemini-3.1-pro-preview, achieved only 59.69% accuracy, highlighting significant gaps in AI reasoning capabilities, particularly in maintaining correct intermediate states through multi-step solutions.

AIBullisharXiv โ€“ CS AI ยท 5h ago0
๐Ÿง 

LEDOM: Reverse Language Model

Researchers have developed LEDOM, an open-source reverse autoregressive language model that trains right-to-left instead of the traditional left-to-right approach. The model demonstrates unique capabilities like abductive inference and question synthesis, and when combined with forward models through 'Reverse Reward' scoring, achieves significant performance gains of up to 15% on mathematical reasoning tasks.