y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#inference-scaling News & Analysis

11 articles tagged with #inference-scaling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

11 articles
AIBearisharXiv – CS AI · Jun 27/10
🧠

Comprehensive AI governance requires addressing non-model gains

A research paper argues that current AI governance frameworks focus too narrowly on model-level controls, missing capability gains from inference optimization, post-training systems, and external assets. The authors propose a broader governance taxonomy encompassing system, entity, agent, and cloud-level oversight, alongside societal resilience measures, to address risks that traditional pre-deployment evaluation cannot capture.

AINeutralarXiv – CS AI · May 297/10
🧠

Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs

Researchers extend the bounded attention prefix oracle (BAPO) model to establish theoretical lower bounds on chain-of-thought reasoning tokens required by LLMs, proving that canonical tasks require Ω(n) tokens as input size n grows. Experiments with frontier models confirm linear scaling behavior, revealing fundamental computational bottlenecks in inference-time scaling.

AIBullisharXiv – CS AI · May 97/10
🧠

Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

Researchers propose Catch Your Breath (CYB), a novel training method that enables AI models to dynamically control the number of computational steps used for processing inputs through <pause> tokens. The approach outperforms standard cross-entropy training by allowing models to signal when they need additional processing time, improving performance metrics like perplexity without increasing computational overhead.

🏢 Perplexity
AIBullisharXiv – CS AI · Mar 167/10
🧠

Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents

Researchers propose Budget-Aware Value Tree (BAVT), a training-free framework that improves LLM agent efficiency by intelligently managing computational resources during multi-hop reasoning tasks. The system outperforms traditional approaches while using 4x fewer resources, demonstrating that smart budget management beats brute-force compute scaling.

AINeutralarXiv – CS AI · Jun 26/10
🧠

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

Researchers challenge the conventional autoregressive versus diffusion model dichotomy, arguing that distinguishing between inference procedures (sequence expansion versus state refinement) matters more than model families. The paper advocates designing inference algorithms before training objectives, highlighting that training methods cannot compensate for flawed inference architectures, with implications for improving generative AI efficiency.

AIBullisharXiv – CS AI · May 286/10
🧠

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

Researchers demonstrate that extrapolative weight averaging—extending beyond trained model checkpoints—can navigate and extend correctness-efficiency frontiers in code reinforcement learning without additional training. Testing on competitive programming tasks reveals that ensembles using this technique improve performance by 3.3% on hard problems, suggesting a scalable method for optimizing AI systems across competing objectives.

AINeutralarXiv – CS AI · May 126/10
🧠

Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching

Sketch-and-Verify is an inference-time scaling technique that improves small language model performance by having the LLM generate multiple algorithmic strategies as program sketches, then filling and verifying them. On HumanEval+, this approach delivers superior cost-performance within a model tier compared to flat sampling, though upgrading to a stronger model tier remains more effective than scaling test-time compute on smaller models.

🧠 Gemini
AINeutralarXiv – CS AI · May 116/10
🧠

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Researchers propose using conditional optimal transport to improve calibration of Process Reward Models (PRMs) used in AI inference-time scaling, addressing the problem of overestimated success probabilities. The method enables better confidence bounds for mathematical reasoning tasks and improves downstream performance in Best-of-N selection frameworks.

AIBearisharXiv – CS AI · Apr 206/10
🧠

Where does output diversity collapse in post-training?

Researchers discover that post-trained language models experience systematic output diversity collapse, where fine-tuning methods reduce the variety of generated responses compared to base models. This collapse is determined during training by data composition choices and cannot be fixed through inference-time adjustments, with implications for scaling methods and creative AI applications.

AIBullisharXiv – CS AI · Apr 136/10
🧠

Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search

Researchers introduce Chain-in-Tree (CiT), a framework that optimizes large language model tree search by selectively branching only when necessary rather than at every step. The approach reduces computational overhead by 75-85% on math reasoning tasks with minimal accuracy loss, making inference-time scaling more practical for resource-constrained deployments.