y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#test-time-scaling News & Analysis

10 articles tagged with #test-time-scaling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles
AIBullisharXiv โ€“ CS AI ยท 3d ago7/10
๐Ÿง 

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Researchers introduce RL^V, a reinforcement learning method that unifies LLM reasoners with generative verifiers to improve test-time compute scaling. The approach achieves over 20% accuracy gains on MATH benchmarks and enables 8-32x more efficient test-time scaling compared to existing RL methods by preserving and leveraging learned value functions.

AIBullisharXiv โ€“ CS AI ยท Apr 67/10
๐Ÿง 

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

Researchers discovered that in Large Reasoning Models like DeepSeek-R1, the first solution is often the best, with alternative solutions being detrimental due to error accumulation. They propose RED, a new framework that achieves up to 19% performance gains while reducing token consumption by 37.7-70.4%.

AIBullisharXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

Researchers demonstrate that large language models can perform reinforcement learning during inference through a new 'in-context RL' prompting framework. The method shows LLMs can optimize scalar reward signals to improve response quality across multiple rounds, achieving significant improvements on complex tasks like mathematical competitions and creative writing.

AIBullisharXiv โ€“ CS AI ยท Mar 37/105
๐Ÿง 

Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

Researchers provide mathematical proof that implicit models can achieve greater expressive power through increased test-time computation, explaining how these memory-efficient architectures can match larger explicit networks. The study validates this scaling property across image reconstruction, scientific computing, operations research, and LLM reasoning domains.

AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

Variation in Verification: Understanding Verification Dynamics in Large Language Models

Researchers analyzed how LLM verifiers assess solution correctness in test-time scaling scenarios, revealing that verification effectiveness varies significantly with problem difficulty, generator strength, and verifier capability. The study demonstrates that weak generators can nearly match stronger ones post-verification and that verifier scaling alone cannot solve fundamental verification challenges.

๐Ÿง  GPT-4
AIBullisharXiv โ€“ CS AI ยท Apr 106/10
๐Ÿง 

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Researchers introduce Sยณ (Stratified Scaling Search), a test-time scaling method for diffusion language models that improves output quality by reallocating compute during the denoising process rather than simple best-of-K sampling. The technique uses a lightweight verifier to evaluate and selectively resample candidate trajectories at each step, demonstrating consistent performance gains across mathematical reasoning and knowledge tasks without requiring model retraining.

AINeutralarXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Understanding the Role of Training Data in Test-Time Scaling

Research paper analyzes test-time scaling in large language models, revealing that longer reasoning chains (CoTs) can reduce training data requirements but may harm performance if relevant skills aren't present in training data. The study provides theoretical framework showing that diverse, relevant, and challenging training tasks optimize test-time scaling performance.

AIBullisharXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences

Researchers introduce Duel-Evolve, a new optimization algorithm that improves LLM performance at test time without requiring external rewards or labels. The method uses self-generated pairwise comparisons and achieved 20 percentage points higher accuracy on MathBench and 12 percentage points improvement on LiveCodeBench.

AIBullisharXiv โ€“ CS AI ยท Mar 35/105
๐Ÿง 

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Researchers introduce ADE-CoT (Adaptive Edit-CoT), a new test-time scaling framework that improves image editing efficiency by 2x while maintaining superior performance. The system uses dynamic resource allocation, edit-specific verification, and opportunistic stopping to optimize the image editing process compared to traditional methods.