#online-learning News & Analysis

24 articles tagged with #online-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

24 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

Anytime Safe PAC Efficient Reasoning

Researchers introduce B-PAC (Betting Probably Approximately Correct) reasoning, a method that optimizes Large Reasoning Models by dynamically routing queries between computationally expensive thinking models and faster alternatives while maintaining performance guarantees. The approach reduces thinking model usage by up to 81% while controlling performance loss in real-time, online settings.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

Researchers introduced SLARouter, an online algorithm that optimizes LLM request routing by learning cost-efficient policies from sparse user feedback while guaranteeing Service Level Agreement compliance. The approach reduces operating costs by up to 2.2x compared to existing solutions without requiring per-benchmark tuning.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Scaling Self-Evolving Agents via Parametric Memory

Researchers introduce TMEM, a parametric memory framework that enables AI agents to learn and evolve within a single episode by updating LoRA weights online, rather than merely retrieving frozen memories. This approach combines explicit memory storage with fast adaptive weights, allowing agents to genuinely improve their policy during rollouts and demonstrates consistent performance gains across multiple benchmarks.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Lodestar: An Online-Learning LLM Inference Router

Researchers introduce Lodestar, a machine learning-based request routing system that dynamically assigns large language model inference tasks to GPU instances in distributed clusters. The system achieves up to 4.38x improvements in latency metrics compared to existing heuristics by continuously learning optimal routing strategies in real-time.

AIBullisharXiv – CS AI · May 127/10

🧠

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Researchers demonstrate that transformer models equipped with continuous latent context tokens can efficiently implement online learning algorithms without parameter updates. A small GPT-2-style model trained with this approach outperforms much larger language models on synthetic online prediction tasks, suggesting a promising architectural direction for adaptive AI systems.

AIBullisharXiv – CS AI · May 97/10

🧠

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

Researchers propose ADAPT, an online data reweighting framework that dynamically adjusts training sample importance during LLM training rather than using static offline selection methods. This approach maintains data diversity while improving generalization, outperforming existing offline curation techniques on instruction tuning and large-scale pretraining tasks.

AIBullisharXiv – CS AI · Mar 177/10

🧠

OpenClaw-RL: Train Any Agent Simply by Talking

OpenClaw-RL is a new reinforcement learning framework that enables AI agents to learn continuously from any type of interaction, including conversations, terminal commands, and GUI interactions. The system extracts learning signals from user responses and feedback, allowing agents to improve simply by being used in real-world scenarios.

AIBullisharXiv – CS AI · Mar 177/10

🧠

EARCP: Self-Regulating Coherence-Aware Ensemble Architecture for Sequential Decision Making -- Ensemble Auto-Regule par Coherence et Performance

Researchers introduce EARCP, a new ensemble architecture for AI that dynamically weights different expert models based on performance and coherence. The system provides theoretical guarantees with sublinear regret bounds and has been tested on time series forecasting, activity recognition, and financial prediction tasks.

AIBullisharXiv – CS AI · Mar 167/10

🧠

When Drafts Evolve: Speculative Decoding Meets Online Learning

Researchers introduce OnlineSpec, a framework that uses online learning to continuously improve draft models in speculative decoding for large language model inference acceleration. The approach leverages verification feedback to evolve draft models dynamically, achieving up to 24% speedup improvements across seven benchmarks and three foundation models.

AINeutralarXiv – CS AI · Jun 255/10

🧠

Adapt Only When It Pays: Budgeted Decision-Loss Priority for Delayed Online Time-Series Adaptation

Researchers introduce ADOWIP, a machine learning framework that intelligently decides when to update forecasting models rather than updating continuously, optimizing compute usage for time-series prediction tasks with delayed feedback. The method demonstrates improved performance on capacity-planning benchmarks while maintaining strict computational budgets, though results remain limited to specific domains.

AINeutralarXiv – CS AI · Jun 196/10

🧠

OnDeFog: Online Decision Transformer under Frame Dropping

Researchers propose OnDeFog, a reinforcement learning method that combines offline and online learning approaches to handle frame dropping in real-world applications. By integrating Decision Transformer mechanisms with online learning, OnDeFog demonstrates improved performance compared to existing offline methods when dealing with missing sensor data and communication delays.

AINeutralarXiv – CS AI · Jun 96/10

🧠

LargeMonitor: Monitoring Online Task-Free Continual Learning via Large Pretrained Models

LargeMonitor is a new framework that uses large pretrained foundation models to detect and diagnose distribution shifts in online task-free continual learning systems without requiring explicit task labels or training-coupled optimization. The approach decouples drift detection from adaptation strategy selection, enabling more precise responses to different types of data stream variations.

AINeutralarXiv – CS AI · Jun 95/10

🧠

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

Researchers introduce Dri-MED, a machine learning algorithm designed to handle multi-armed bandit problems with personalized user preferences, drifting context distributions, and baseline performance constraints. The algorithm achieves improved regret bounds while minimizing constraint violations, demonstrating practical advantages over conservative baseline approaches in experimental settings.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Online Pandora's Box for Contextual LLM Cascading

Researchers propose an online contextual Pandora's Box model for optimizing LLM API cascading, where decision-makers sequentially query multiple APIs and select outputs based on indirect reward feedback. The approach achieves theoretically optimal regret bounds without requiring full distribution estimation, advancing practical optimization strategies for multi-API LLM systems.

$MKR

AINeutralarXiv – CS AI · Jun 56/10

🧠

Regret Minimization with Adaptive Opponents in Repeated Games

Researchers introduce Repeated Policy Regret (RP-Regret), a new game-theoretic metric for analyzing regret minimization in repeated games with adaptive opponents who can respond to historical play. The paper proposes three algorithms to minimize RP-Regret despite its non-convex nature and demonstrates that when all players use these algorithms, certain subgame perfect equilibria can be learned, with experiments showing improved cooperation in games like Stag-Hunt.

AINeutralarXiv – CS AI · Jun 26/10

🧠

COPF: An Online Framework for Deployment-Stable Counterfactual Fairness in Evolving Graphs

Researchers introduce COPF, a framework for monitoring and controlling fairness in online link recommendation systems on evolving graphs. The system addresses the challenge that recommendation algorithms are performative—they change user behavior and create feedback loops that make traditional fairness estimates unreliable after deployment.

AINeutralarXiv – CS AI · May 296/10

🧠

The Sample Complexity of Multiclass and Sparse Contextual Bandits

Researchers present optimal algorithms for sparse contextual bandits that achieve sample complexity of Õ((s/ε² + |A|/ε)log|Π|/δ), closing a gap from prior work that had exponential dependence on action set size. The results apply to multiclass classification and combinatorial semi-bandits through information-theoretic and algorithmic approaches.

AIBullisharXiv – CS AI · May 286/10

🧠

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget

EvoSpec introduces a dynamic framework for accelerating Large Language Model inference through real-time adaptation of vocabulary and parameters in speculative decoding. By addressing the vocabulary bottleneck that causes performance degradation in specialized domains, EvoSpec achieves 1.13x speedup improvements over static baselines while reducing memory overhead by 27%.

AINeutralarXiv – CS AI · May 286/10

🧠

On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective

Researchers introduce the first theoretical framework for analyzing test-time adaptation (TTA) in machine learning, establishing recovery complexity bounds that reveal fundamental limits on how quickly models can adapt to non-stationary data streams without labeled data. The work provides mathematical guarantees for TTA learnability and identifies an intrinsic trade-off between adaptivity and information constraints.

AINeutralarXiv – CS AI · May 285/10

🧠

Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration

Researchers propose Under-Cali, a machine learning framework for forecasting irregular multivariate time series data in real-time online settings. The system uses uncertainty estimation and dual-expert calibration to maintain accuracy despite dynamic data distribution shifts, achieving improvements over existing methods with minimal computational overhead.

AINeutralarXiv – CS AI · May 125/10

🧠

Multi-Armed Bandits With Best-Action Queries

Researchers resolve an open problem in multi-armed bandit theory by characterizing how best-action oracle queries improve learning algorithms in the realistic bandit-feedback model. They prove that benefits depend critically on reward structure: correlated stochastic rewards cannot achieve the theoretical gains seen in full-feedback settings, while i.i.d. stochastic rewards maintain near-optimal improvements with logarithmic precision.

AIBullisharXiv – CS AI · Mar 55/10

🧠

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Researchers developed a new variance-reduced EXP4-based algorithm for optimizing routing policies in multi-layer hierarchical inference systems. The solution addresses the challenge of sparse, policy-dependent feedback in AI systems where prediction errors are only revealed at terminal layers, improving stability and performance over standard importance-weighted approaches.

AIBullishOpenAI News · Mar 256/107

🧠

Scaling the OpenAI Academy

OpenAI is expanding its Academy initiative into a comprehensive online resource hub designed to improve AI literacy across diverse backgrounds. The platform will provide tools, best practices, and peer insights to help users effectively access and utilize AI technologies.

AINeutralarXiv – CS AI · Mar 34/103

🧠

Reservoir Subspace Injection for Online ICA under Top-n Whitening

Researchers developed Reservoir Subspace Injection (RSI) to improve online Independent Component Analysis under nonlinear mixing conditions. The study identifies performance bottlenecks in top-n whitening and proposes a guarded RSI controller that preserves system performance while achieving 1.7 dB improvement over vanilla online ICA methods.