y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mmlu News & Analysis

4 articles tagged with #mmlu. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AI × CryptoBullishBlockonomi · Mar 147/10
🤖

Bittensor’s Subnet 3 Trains 72B AI Model on Decentralized Network

Bittensor's Subnet 3 successfully trained Covenant-72B, a 72 billion parameter AI model on a decentralized network, outperforming LLaMA-2-70B with a 67.1 MMLU score versus 65.6. The achievement utilized SparseLoCo technology to reduce communication overhead by 146x and featured blockchain-based contribution tracking, driving TAO token up 14% to $236.

$TAO
AINeutralarXiv – CS AI · Jun 26/10
🧠

The Shape of Wisdom: Decision Trajectories in Language Models

Researchers analyzed how language models make decisions by tracing answer scores across neural network layers in 9,000 MMLU trajectories, finding that correct answers are often unstable and that attention mechanisms better preserve correctness than MLP layers. The study reveals decision-making is a distributed process rather than a final-layer phenomenon, with implications for understanding model reliability and interpretability.

🧠 Llama
AINeutralarXiv – CS AI · May 116/10
🧠

Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

Researchers evaluated metacognitive monitoring across 33 frontier LLMs using 47,151 MMLU benchmark items, finding significant domain-level variation masked by aggregate performance scores. Applied/Professional knowledge domains showed consistently strong self-monitoring (AUROC .742), while Formal Reasoning and Natural Science proved most challenging, with implications for targeted model deployment.

🏢 OpenAI🏢 Anthropic🧠 Gemini
AINeutralarXiv – CS AI · Mar 276/10
🧠

Efficient Detection of Bad Benchmark Items with Novel Scalability Coefficients

Researchers introduce a new nonparametric method called signed isotonic R² for efficiently detecting problematic items in AI benchmarks and assessments. The method outperforms traditional diagnostic techniques across major AI datasets including GSM8K and MMLU, offering a lightweight solution for improving evaluation quality.