#mcts News & Analysis

14 articles tagged with #mcts. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

14 articles

AIBullisharXiv – CS AI · May 287/10

🧠

MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation

Researchers introduce MCTS-Judge, a test-time scaling framework that enhances LLM-based code evaluation by applying Monte Carlo Tree Search to improve reasoning accuracy. The system achieves 80% accuracy on code correctness tasks—surpassing OpenAI's o1 models while using 3x fewer tokens—addressing a critical limitation in using LLMs as reliable judges for complex technical problems.

AIBullisharXiv – CS AI · Mar 47/102

🧠

$\texttt{SEM-CTRL}$: Semantically Controlled Decoding

Researchers introduce SEM-CTRL, a new approach that ensures Large Language Models produce syntactically and semantically correct outputs without requiring fine-tuning. The system uses token-level Monte Carlo Tree Search guided by Answer Set Grammars to enforce context-sensitive constraints, allowing smaller pre-trained LLMs to outperform larger models on tasks like reasoning and planning.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

Researchers propose Budget-Guided MCTS, a tree-search algorithm that optimizes large language model inference by dynamically adjusting exploration and refinement strategies based on remaining token budgets. The method addresses a practical deployment challenge where fixed computational budgets vary across use cases, outperforming budget-agnostic approaches on mathematical and physics reasoning tasks.

AIBullisharXiv – CS AI · Jun 26/10

🧠

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

HomeFlow introduces a data flywheel system for training large language model agents in smart home environments, using procedural generation and Monte Carlo tree search to create diverse, verifiable training trajectories. The approach achieves 87.03% task success rates on a new SmartHome-Bench benchmark, outperforming GPT-5.5 by 1.23 percentage points.

🧠 GPT-5

AINeutralarXiv – CS AI · Jun 26/10

🧠

Two-Fidelity Best-Action Identification for Stochastic Minimax Tree

Researchers propose 2FFS, a two-fidelity tree-search algorithm that optimizes the tradeoff between cheap but biased heuristic evaluations and expensive but accurate rollouts in stochastic minimax trees. The method combines minimax and Monte Carlo Tree Search techniques with proven fixed-confidence correctness, achieving substantial sample and computational efficiency gains over existing approaches.

AINeutralarXiv – CS AI · Jun 16/10

🧠

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

Researchers introduce COMPASS, a safety alignment framework for LLM-powered search agents that prevents harmful outcomes from seemingly innocent multi-step queries. The method combines cognitive tree exploration and step-wise alignment to achieve robust safety while maintaining utility, requiring less training data than existing approaches.

AINeutralarXiv – CS AI · May 286/10

🧠

When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

Researchers demonstrate that memory mechanisms in multi-trajectory LLM agents produce inconsistent results depending on the inference strategy used, revealing that previous evaluations conflated memory abstraction properties with inference method effects. The study systematically evaluates four memory methods across three inference strategies on tool-use benchmarks, showing that reflection, fact extraction, and observation injection each perform optimally under different conditions.

AINeutralarXiv – CS AI · May 286/10

🧠

Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

Researchers introduce McDiffuSE, an MCTS-based framework that optimizes slot-filling order in Masked Diffusion Models to improve performance on mathematical and code reasoning tasks. The approach achieves 3.2% improvement over autoregressive baselines and up to 19.5% gains on specific benchmarks by strategically exploring generation orderings rather than following sequential patterns.

AINeutralarXiv – CS AI · May 275/10

🧠

Monte Carlo Permutation Search

Researchers propose Monte Carlo Permutation Search (MCPS), an improved Monte Carlo Tree Search algorithm that enhances the GRAVE algorithm for game-playing AI. MCPS leverages statistics from all playouts containing moves along the path from root to node, demonstrating superior performance across multiple games while eliminating GRAVE's bias hyperparameter.

AINeutralarXiv – CS AI · May 126/10

🧠

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

Researchers introduce TESSERA, a neuro-symbolic framework that combines Large Language Models with Monte Carlo Tree Search to extract multi-step explanations from knowledge graphs, specifically for drug-disease mechanism discovery. The system uses LLMs for local judgments rather than autonomous generation, enforcing structural constraints through knowledge graphs while employing MCTS for principled credit assignment across extended reasoning chains.

AINeutralarXiv – CS AI · May 116/10

🧠

Finite-Time Analysis of MCTS in Continuous POMDP Planning

Researchers present the first finite-time theoretical analysis of Monte Carlo Tree Search (MCTS) applied to Partially Observable Markov Decision Processes (POMDPs), bridging a critical gap in algorithmic guarantees. The paper introduces Voro-POMCPOW, which uses Voronoi cell partitioning for continuous observation spaces, proving high-probability bounds on value estimates while maintaining competitive empirical performance.

AIBullisharXiv – CS AI · May 76/10

🧠

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

CodeEvolve is an AI-driven evolutionary framework that automates code optimization by using LLMs, runtime profiling, and Monte Carlo Tree Search to identify and improve performance bottlenecks. The system achieves significant speedups (15.22x average) on enterprise Java codebases while maintaining functional correctness through rigorous validation pipelines.

AIBullisharXiv – CS AI · Apr 106/10

🧠

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Researchers introduce PyFi, a framework enabling vision language models to understand financial images through progressive reasoning chains, backed by a 600K synthetic dataset organized as a reasoning pyramid. The approach uses adversarial agents to automatically generate training data without human annotation, achieving up to 19.52% accuracy improvements on fine-tuned models.

AIBullisharXiv – CS AI · Mar 36/107

🧠

LiTS: A Modular Framework for LLM Tree Search

LiTS is a new modular Python framework that enables LLM reasoning through tree search algorithms like MCTS and BFS. The framework demonstrates reusable components across different domains and reveals that LLM policy diversity, not reward quality, is the key bottleneck for effective tree search in infinite action spaces.