y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mid-training News & Analysis

2 articles tagged with #mid-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv – CS AI · 3d ago6/10
🧠

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Researchers introduce MIRA, a framework for optimizing data selection during mid-training of large language models by dynamically discovering and applying source-specific evaluation rubrics. The approach achieves comparable performance to full-corpus training while reducing token usage by 50% on code-oriented tasks across 21 diverse data sources.

AIBullisharXiv – CS AI · Mar 36/107
🧠

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

Researchers have developed Thoth, the first family of Large Language Models specifically designed to understand and reason about time series data through a mid-training approach. The model uses a specialized corpus called Book-of-Thoth to bridge the gap between temporal data and natural language, significantly outperforming existing LLMs in time series analysis tasks.