#dynamic-programming News & Analysis

5 articles tagged with #dynamic-programming. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · May 97/10

🧠

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

Researchers propose sparse prefix caching, a novel optimization technique for hybrid and recurrent LLM serving that stores exact states at checkpoint positions rather than caching entire token histories. The method uses dynamic programming to determine optimal checkpoint placement and demonstrates superior performance on real-world datasets while using fewer checkpoints than existing dense caching approaches.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Unifying and Optimizing Data Values for Selection via Sequential Decision-Making

Researchers propose a new framework that reinterprets data selection as a sequential decision-making problem rooted in dynamic programming, unifying existing methods like Data Shapley while revealing their limitations as myopic approximations. The work introduces a scalable bipartite graph-based approach that preserves submodular structure and demonstrates improvements on machine learning and LLM fine-tuning tasks.

AINeutralarXiv – CS AI · May 296/10

🧠

Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities

Researchers introduce Quotient DAGs, a novel framework for off-policy evaluation that addresses variance issues in importance sampling by recognizing when generation process details are irrelevant to evaluation targets. The method computes exact unordered slate propensities efficiently through Forward-DP, a dynamic programming approach that avoids factorial enumeration, enabling practical evaluation for autoregressive slate recommendation systems.

AINeutralarXiv – CS AI · May 276/10

🧠

Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems

Researchers identify critical failure modes in policy-gradient reinforcement learning methods when applied to long-horizon problems with cumulative damage, where short-term attractive actions lead to long-term negative outcomes. The study proposes a decomposition framework separating completion (reaching terminal horizon) from optimality (achieving dynamic-programming benchmarks) and validates predictions across two distinct domains: career planning and sports performance.

AINeutralarXiv – CS AI · Apr 136/10

🧠

StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning

StructRL is a new reinforcement learning framework that recovers dynamic programming structure from distributional learning dynamics without requiring explicit models. The research demonstrates that temporal patterns in return distribution evolution reveal inherent structure in how information propagates through state spaces, enabling more efficient and stable learning.