AIBullisharXiv – CS AI · 6d ago7/10
🧠Researchers introduce TMEM, a parametric memory framework that enables AI agents to learn and evolve within a single episode by updating LoRA weights online, rather than merely retrieving frozen memories. This approach combines explicit memory storage with fast adaptive weights, allowing agents to genuinely improve their policy during rollouts and demonstrates consistent performance gains across multiple benchmarks.
AIBullisharXiv – CS AI · Jun 27/10
🧠Researchers introduce Lodestar, a machine learning-based request routing system that dynamically assigns large language model inference tasks to GPU instances in distributed clusters. The system achieves up to 4.38x improvements in latency metrics compared to existing heuristics by continuously learning optimal routing strategies in real-time.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers demonstrate that transformer models equipped with continuous latent context tokens can efficiently implement online learning algorithms without parameter updates. A small GPT-2-style model trained with this approach outperforms much larger language models on synthetic online prediction tasks, suggesting a promising architectural direction for adaptive AI systems.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers propose ADAPT, an online data reweighting framework that dynamically adjusts training sample importance during LLM training rather than using static offline selection methods. This approach maintains data diversity while improving generalization, outperforming existing offline curation techniques on instruction tuning and large-scale pretraining tasks.
AIBullisharXiv – CS AI · Mar 177/10
🧠OpenClaw-RL is a new reinforcement learning framework that enables AI agents to learn continuously from any type of interaction, including conversations, terminal commands, and GUI interactions. The system extracts learning signals from user responses and feedback, allowing agents to improve simply by being used in real-world scenarios.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce EARCP, a new ensemble architecture for AI that dynamically weights different expert models based on performance and coherence. The system provides theoretical guarantees with sublinear regret bounds and has been tested on time series forecasting, activity recognition, and financial prediction tasks.
AIBullisharXiv – CS AI · Mar 167/10
🧠Researchers introduce OnlineSpec, a framework that uses online learning to continuously improve draft models in speculative decoding for large language model inference acceleration. The approach leverages verification feedback to evolve draft models dynamically, achieving up to 24% speedup improvements across seven benchmarks and three foundation models.
AINeutralarXiv – CS AI · 1d ago6/10
🧠LargeMonitor is a new framework that uses large pretrained foundation models to detect and diagnose distribution shifts in online task-free continual learning systems without requiring explicit task labels or training-coupled optimization. The approach decouples drift detection from adaptation strategy selection, enabling more precise responses to different types of data stream variations.
AINeutralarXiv – CS AI · 1d ago5/10
🧠Researchers introduce Dri-MED, a machine learning algorithm designed to handle multi-armed bandit problems with personalized user preferences, drifting context distributions, and baseline performance constraints. The algorithm achieves improved regret bounds while minimizing constraint violations, demonstrating practical advantages over conservative baseline approaches in experimental settings.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers propose an online contextual Pandora's Box model for optimizing LLM API cascading, where decision-makers sequentially query multiple APIs and select outputs based on indirect reward feedback. The approach achieves theoretically optimal regret bounds without requiring full distribution estimation, advancing practical optimization strategies for multi-API LLM systems.
$MKR
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce Repeated Policy Regret (RP-Regret), a new game-theoretic metric for analyzing regret minimization in repeated games with adaptive opponents who can respond to historical play. The paper proposes three algorithms to minimize RP-Regret despite its non-convex nature and demonstrates that when all players use these algorithms, certain subgame perfect equilibria can be learned, with experiments showing improved cooperation in games like Stag-Hunt.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers introduce COPF, a framework for monitoring and controlling fairness in online link recommendation systems on evolving graphs. The system addresses the challenge that recommendation algorithms are performative—they change user behavior and create feedback loops that make traditional fairness estimates unreliable after deployment.
AINeutralarXiv – CS AI · May 296/10
🧠Researchers present optimal algorithms for sparse contextual bandits that achieve sample complexity of Õ((s/ε² + |A|/ε)log|Π|/δ), closing a gap from prior work that had exponential dependence on action set size. The results apply to multiclass classification and combinatorial semi-bandits through information-theoretic and algorithmic approaches.
AIBullisharXiv – CS AI · May 286/10
🧠EvoSpec introduces a dynamic framework for accelerating Large Language Model inference through real-time adaptation of vocabulary and parameters in speculative decoding. By addressing the vocabulary bottleneck that causes performance degradation in specialized domains, EvoSpec achieves 1.13x speedup improvements over static baselines while reducing memory overhead by 27%.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers introduce the first theoretical framework for analyzing test-time adaptation (TTA) in machine learning, establishing recovery complexity bounds that reveal fundamental limits on how quickly models can adapt to non-stationary data streams without labeled data. The work provides mathematical guarantees for TTA learnability and identifies an intrinsic trade-off between adaptivity and information constraints.
AINeutralarXiv – CS AI · May 285/10
🧠Researchers propose Under-Cali, a machine learning framework for forecasting irregular multivariate time series data in real-time online settings. The system uses uncertainty estimation and dual-expert calibration to maintain accuracy despite dynamic data distribution shifts, achieving improvements over existing methods with minimal computational overhead.
AINeutralarXiv – CS AI · May 125/10
🧠Researchers resolve an open problem in multi-armed bandit theory by characterizing how best-action oracle queries improve learning algorithms in the realistic bandit-feedback model. They prove that benefits depend critically on reward structure: correlated stochastic rewards cannot achieve the theoretical gains seen in full-feedback settings, while i.i.d. stochastic rewards maintain near-optimal improvements with logarithmic precision.
AIBullisharXiv – CS AI · Mar 55/10
🧠Researchers developed a new variance-reduced EXP4-based algorithm for optimizing routing policies in multi-layer hierarchical inference systems. The solution addresses the challenge of sparse, policy-dependent feedback in AI systems where prediction errors are only revealed at terminal layers, improving stability and performance over standard importance-weighted approaches.
AIBullishOpenAI News · Mar 256/107
🧠OpenAI is expanding its Academy initiative into a comprehensive online resource hub designed to improve AI literacy across diverse backgrounds. The platform will provide tools, best practices, and peer insights to help users effectively access and utilize AI technologies.
AINeutralarXiv – CS AI · Mar 34/103
🧠Researchers developed Reservoir Subspace Injection (RSI) to improve online Independent Component Analysis under nonlinear mixing conditions. The study identifies performance bottlenecks in top-n whitening and proposes a guarded RSI controller that preserves system performance while achieving 1.7 dB improvement over vanilla online ICA methods.