Analytics Digests Sources Topics RSS AI Crypto

#model-training News & Analysis

152 articles tagged with #model-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

152 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

Steer, Don't Solve: Training Small Critic Models for Large Code Agents

Researchers developed a small critic model that guides large code agents during execution rather than evaluating completed work, reducing computational costs while improving performance. The approach achieves 25.2% accuracy on SWE-bench Verified at 64% lower expense than larger agents, demonstrating that supplementing agent training with efficient feedback mechanisms outperforms scaling alone.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 237/10

🧠

Social World Model for Lifelong Social Intelligence

Researchers propose the Social World Model, a framework for continuous learning in language agents through structured social interaction decomposition across five dimensions. The approach demonstrates that smaller open-source models like Qwen2.5-7B can achieve competitive social intelligence capabilities comparable to closed-source alternatives while maintaining performance across difficulty levels.

🧠 Gemini

AIBullisharXiv – CS AI · Jun 117/10

🧠

Grounding Computer Use Agents on Human Demonstrations

Researchers introduce GroundCUA, a large-scale desktop grounding dataset with 56K screenshots and 3.56M annotations from expert human demonstrations, enabling the development of GroundNext models that achieve state-of-the-art performance in mapping natural language instructions to UI elements while requiring significantly less training data than prior approaches.

AI × CryptoBullishCrypto Briefing · Jun 107/10

🤖

Sapient trains 1B-parameter HRM-Text model for $1,500 in 1.9 days

Sapient successfully trained a 1 billion-parameter HRM-Text language model for just $1,500 in 1.9 days, demonstrating significant cost efficiency in AI model development. This breakthrough could lower barriers to entry for decentralized AI development and expand access to advanced model training capabilities across the industry.

Sapient trains 1B-parameter HRM-Text model for $1,500 in 1.9 days

AIBullisharXiv – CS AI · Jun 107/10

🧠

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Researchers introduce TruthRL, a reinforcement learning framework that optimizes large language models for truthfulness by reducing hallucinations while allowing strategic abstention when uncertain. The method achieves significant improvements across multiple benchmarks, reducing hallucinations by over 50% while improving truthfulness metrics substantially.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Reliable to Expressive: A Curriculum for Rubric-Following Safety Judges

Researchers developed a curriculum-based training method for safety judges that dramatically improves their consistency across different evaluation rubrics. The approach combines dynamic rubric generation with a staged learning process, achieving 94.12-94.88% accuracy with minimal variance across three different rubric styles, outperforming larger general-purpose and specialized LLMs.

AIBullisharXiv – CS AI · Jun 97/10

🧠

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

INFUSER is a novel self-evolution framework that enables language models to improve their reasoning capabilities through an iterative co-training process between a Generator and Solver, using an influence-aware scoring mechanism rather than difficulty heuristics. The method achieves 20% relative improvement on mathematical and coding benchmarks, demonstrating that adaptive curriculum learning can outperform larger frozen models.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Diverse Thinking Schemata Elicit Better Reasoning in Large Language Models

Researchers introduce Diverse Schemata Policy Optimization (DiScO), a framework that improves large language model reasoning by encouraging diversity in thinking approaches and solution paths. The method consistently outperforms standard optimization techniques on mathematical benchmarks and shows particular strength in helping models recover from initial errors.

AIBullisharXiv – CS AI · Jun 97/10

🧠

ComplexConstraints and Beyond: Expert Rubrics for RLVR

Researchers present a systematic framework for evaluating large language models using expert-curated rubrics instead of traditional programmatic benchmarks. The ComplexConstraints dataset demonstrates that rubric-based evaluation and training improves instruction-following performance by 12-15% across model sizes and transfers gains to out-of-distribution benchmarks.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Explaining Data Mixing Scaling Laws

Researchers propose a theoretical framework explaining data mixing scaling laws for multi-domain machine learning models, identifying capacity competition and noise reduction as key mechanisms governing model performance across different data mixtures, with successful extrapolation to larger unseen scales.

AINeutralarXiv – CS AI · Jun 87/10

🧠

A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning

Researchers conducted an empirical comparison of mathematical reasoning between humans and DeepSeek-R1, analyzing 10,247 reasoning steps across 30 AIME problems. The study reveals that while the AI model exhibits surface-level reasoning patterns, it engages in inefficient verification loops and lacks the structured deduction humans employ, suggesting current long-chain-of-thought models may be optimized for appearing to reason rather than reasoning effectively.

AIBullisharXiv – CS AI · Jun 57/10

🧠

DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions

Researchers introduce DragOn, a large-scale benchmark dataset with 286K training screenshots and 3.5M tasks designed to improve GUI agents' ability to perform drag-based interactions like highlighting, resizing, and swiping. The dataset addresses a critical gap where drag-grounding capabilities lag significantly behind click-grounding in AI models controlling desktops and mobile devices.

🧠 Claude

AIBullisharXiv – CS AI · Jun 27/10

🧠

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

Researchers introduced a novel reinforcement learning technique called delayed per-step reward attribution that enables language model agents to train effectively in multi-agent strategic environments where traditional per-step rewards fail. An 8-billion-parameter open-source model trained with this method won first place at NeurIPS 2025's MindGames Arena benchmark, outperforming substantially larger proprietary systems including GPT-5.

🧠 GPT-5

AIBullisharXiv – CS AI · Jun 27/10

🧠

Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation

Researchers have developed a framework for generating high-quality synthetic data that enables Large Language Models to achieve predictable scaling laws for recommendation systems—a previously unattainable milestone. Models trained on this principled synthetic data outperform those trained on real user interaction data by 130% on key metrics, establishing a foundational methodology for scaling LLM capabilities in recommendations.

🏢 Perplexity

AIBullisharXiv – CS AI · Jun 17/10

🧠

CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability

CVE-Factory is an automated multi-agent framework that transforms vulnerability metadata into executable security tasks with expert-level quality, achieving 95% correctness and enabling the creation of LiveCVEBench—a continuously updated benchmark of 190 security tasks across 14 programming languages that advances AI code security evaluation.

🧠 Claude

AIBullisharXiv – CS AI · May 297/10

🧠

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Researchers introduce TRACE, a novel metric for evaluating the reasoning quality of large language models' Chain-of-Thought outputs by analyzing argument structure rather than just final answers. The method combines Toulmin's argumentation theory with metacognitive frameworks and demonstrates strong correlation with benchmark accuracy while improving reinforcement learning performance.

AIBullisharXiv – CS AI · May 297/10

🧠

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Researchers introduce Reasoning in Memory (RiM), a novel method that enables large language models to perform internal reasoning using fixed memory blocks instead of generating intermediate tokens. The approach matches or exceeds existing reasoning methods while being more compute-efficient, as memory blocks process in a single forward pass rather than through autoregressive generation.

AIBullisharXiv – CS AI · May 297/10

🧠

GRPO is Secretly a Process Reward Model

Researchers demonstrate that Group Relative Policy Optimization (GRPO), a popular reinforcement learning algorithm using outcome rewards, mathematically functions as an implicit process reward model. The discovery enables algorithmic improvements (λ-GRPO) that enhance large language model performance on reasoning tasks without explicit process reward implementation or significant computational overhead.

AIBullisharXiv – CS AI · May 287/10

🧠

The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models

Researchers introduce a topological data analysis framework to evaluate reasoning quality in large language models, moving beyond traditional graph-based metrics. The study demonstrates that higher-dimensional geometric structures predict reasoning quality more effectively than standard connectivity measures, offering a practical signal for training optimization.

AINeutralarXiv – CS AI · May 287/10

🧠

The Future of Facts: Tracing the Factual Generation-Verification Gap

Researchers reveal that language models verify factual information more reliably than they generate it, a phenomenon driven by distinct training dynamics rather than computational limitations. The study traces this generation-verification gap across model families and training phases, finding that models can simultaneously accept contradictory facts after updates, creating consistency issues for AI systems deployed as knowledge interfaces.

AIBullisharXiv – CS AI · May 277/10

🧠

HTMLCure: Turning Browser Experience into State Guided Repair for Interactive HTML

HTMLCure introduces a browser experience framework that improves how large language models generate functional HTML pages by testing them across multiple interactions and states rather than relying on static screenshots. The system automatically repairs broken pages through a closed-loop process, demonstrating significant performance improvements on HTML generation benchmarks.

🧠 GPT-5

AIBullisharXiv – CS AI · May 277/10

🧠

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

Researchers introduce Athena-PRM, a multimodal process reward model that evaluates reasoning steps in complex problem-solving with remarkable data efficiency, requiring only 5,000 samples. The model leverages prediction consistency between weak and strong AI completers to generate high-quality training labels, achieving state-of-the-art results across multiple benchmarks including WeMath, MathVista, and VisualProcessBench.

AIBullisharXiv – CS AI · May 277/10

🧠

Curriculum Learning for Safety Alignment

Researchers propose Staged-Competence, a curriculum learning framework that enhances Direct Preference Optimisation (DPO) for AI safety alignment. The method reduces out-of-distribution harmful responses by 16% and jailbreak success rates by 20% while maintaining model capabilities, achieving baseline safety with 25% less training data.

AIBullisharXiv – CS AI · May 127/10

🧠

Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs

Researchers propose TPAW, a self-play algorithm that improves LLM alignment without human-labeled data by having models collaborate and compete against historical checkpoints while using adaptive weighting mechanisms. The approach addresses instability and diminishing optimization gains in existing self-training methods, demonstrating consistent improvements across multiple benchmarks.

AIBullisharXiv – CS AI · May 127/10

🧠

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards

Researchers introduce DUET, a method for optimizing token allocation in reinforcement learning with verifiable rewards that jointly controls which prompts receive rollouts and how long each rollout runs. The technique achieves superior reasoning quality on math and coding benchmarks while using 50% fewer tokens than baseline methods, suggesting efficiency gains don't require sacrificing model performance.

🧠 Llama

Page 1 of 7Next →

Tag Connections

#geopolitical↔#iran

292

#iran↔#market

214

173

#geopolitical↔#market

142

141

#bitcoin↔#market

114

#fed↔#inflation

103

#iran↔#security

94

83

79

Tag Sentiment

#market1319 articles

#ai1032 articles

#iran845 articles

#geopolitical505 articles

#bitcoin424 articles

#trump318 articles

#security276 articles

#inflation231 articles

#fed202 articles

#trading194 articles

BullishNeutralBearish

◆ AI Mentions

🏢OpenAI

141×

🏢Anthropic

96×

🏢Nvidia

72×

🧠Claude

59×

🧠GPT-5

58×

🧠ChatGPT

32×

🧠Gemini

29×

🏢Meta

24×

🧠Grok

16×

🏢Hugging Face

12×

🧠GPT-4

12×

🏢Perplexity

10×

🏢xAI

10×

🏢Google

8×

🧠Llama

8×

🧠Opus

8×

🏢Microsoft

6×

🧠Sonnet

5×

🧠Copilot

2×

🧠Sora

1×

Stay Updated

Everything combined

▲ Trending Tags

1#market1320 2#ai1032 3#iran845 4#geopolitical505 5#bitcoin425 6#trump318 7#security276 8#inflation231 9#fed202 10#trading194 11#adoption154 12#openai141 13#stablecoin141 14#china136 15#ethereum131

Filters

Sentiment

Importance

Sort

📡 See all 70+ sources

y0.exchange

Your AI agent for DeFi

Connect Claude or GPT to your wallet. AI reads balances, proposes swaps and bridges — you approve. Your keys never leave your device.

8 MCP tools · 15 chains · $0 fees

Connect Wallet to AI →How it works →

Viewing: y0 Digest feed