AINeutralarXiv – CS AI · Mar 37/104
🧠New research formally defines and analyzes pattern matching in large language models, revealing predictable limits in their ability to generalize on compositional tasks. The study provides mathematical boundaries for when pattern matching succeeds or fails, with implications for AI model development and understanding.
AIBullisharXiv – CS AI · Feb 277/109
🧠Researchers achieved breakthrough sample complexity improvements for offline reinforcement learning algorithms using f-divergence regularization, particularly for contextual bandits. The study demonstrates optimal O(ε⁻¹) sample complexity under single-policy concentrability conditions, significantly improving upon existing bounds.
$NEAR
AINeutralarXiv – CS AI · Feb 277/107
🧠Researchers propose a new approach for training AI models to generate correct answers from demonstrations, using imitation learning in contextual bandits rather than traditional supervised fine-tuning. The method achieves better sample complexity and works with weaker assumptions about the underlying reward model compared to existing likelihood-maximization approaches.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce auto-exploration, a new reinforcement learning method that automatically explores state and action spaces without requiring manual parameter tuning. The approach achieves optimal sample complexity of O(ε⁻²) while remaining parameter-free and implementable, advancing theoretical RL foundations.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers present the first theoretical framework establishing sample complexity bounds for discrete-state diffusion models, a fundamental gap in AI research. The work provides an $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity bound and decomposes score estimation error into four components, advancing understanding of how these models can be trained efficiently for text and combinatorial applications.
AINeutralarXiv – CS AI · Jun 96/10
🧠Researchers present a theoretical framework for offline reinforcement learning that answers a fundamental open question negatively: Q*-realizability and Bellman completeness alone are insufficient for sample-efficient learning under partial coverage. The work introduces a decision-estimation framework that improves sample complexity bounds for practical algorithms like Conservative Q-Learning and extends theoretical understanding to previously unexplored settings.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers prove that fixed-budget best-arm identification in bandit problems is no harder than fixed-confidence approaches up to logarithmic factors, introducing FC2FB—a meta-algorithm that converts fixed-confidence algorithms to fixed-budget ones while maintaining optimal sample complexity. This fundamental result establishes a previously unclear relationship between two core machine learning paradigms and enables improved algorithms across multiple problem classes.
AINeutralarXiv – CS AI · May 296/10
🧠Researchers present optimal algorithms for sparse contextual bandits that achieve sample complexity of Õ((s/ε² + |A|/ε)log|Π|/δ), closing a gap from prior work that had exponential dependence on action set size. The results apply to multiclass classification and combinatorial semi-bandits through information-theoretic and algorithmic approaches.
AINeutralarXiv – CS AI · May 285/10
🧠Researchers have proven optimal sample complexity for learning linear contracts in offline settings, showing that Empirical Utility Maximization requires only O(ln(1/δ)/ε²) samples to approximate optimal contracts. This result matches theoretical lower bounds and establishes uniform convergence guarantees across all linear contracts.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers achieve the first fast statistical rates (Õ(ε⁻¹)) for offline contextual bandits using forward-KL regularization under single-policy concentrability, matching the performance previously only shown for reverse-KL approaches and establishing rate-optimal lower bounds.