#actor-critic News & Analysis

17 articles tagged with #actor-critic. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBearisharXiv – CS AI · Jun 107/10

🧠

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

Researchers identify two critical failure modes in deep multi-agent reinforcement learning applied to continuous pricing markets: tacit collusion between DDPG agents and actor-critic instability at high event rates. While asynchronous pricing and latency reduce collusion by up to 48%, the fix remains partial and breaks down under high-frequency conditions, revealing fundamental limitations in current MARL approaches for market simulation.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

Researchers demonstrate that representation learning, rather than model-based planning, is the key driver of scalable multitask reinforcement learning. Their proposed MR.Q algorithm combines predictive representations with value function approximation to outperform existing world-model methods while reducing computational overhead.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

Researchers propose Generative Actor-Critic (GenAC), a new approach to value modeling in large language model reinforcement learning that uses chain-of-thought reasoning instead of one-shot scalar predictions. The method addresses a longstanding challenge in credit assignment by improving value approximation and downstream RL performance compared to existing value-based and value-free baselines.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

Researchers present PAVE, a theoretical and practical framework addressing policy instability in actor-critic reinforcement learning by stabilizing the critic's Q-function gradient field rather than directly regularizing policy outputs. The work demonstrates that policy smoothness is fundamentally determined by the critic's differential geometry, offering a more principled approach to deploying learned policies in physical systems.

AINeutralarXiv – CS AI · Jun 95/10

🧠

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

Researchers introduce TT-DAC-PS, an advanced reinforcement learning algorithm designed to optimize large stock sell execution by combining deterministic actor-critic methods with policy smoothing and conservative regularization. Testing on real U.S. stock limit order book data demonstrates superior performance compared to classical execution algorithms like TWAP and VWAP, as well as standard RL baselines, achieving lower implementation shortfall costs.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

Researchers introduce SV-QD-RL, a reinforcement learning framework that generates diverse policy repertoires by conditioning actor networks on learned structural masks and pairing them with branch-specific critics. The approach demonstrates improved performance on continuous control tasks while maintaining behavioral diversity through structure-aware archive management.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Retry Policy Gradients in Continuous Action Spaces

Researchers introduce ReMax Actor-Critic (ReMAC), extending retry-based policy gradient methods from discrete to continuous action spaces. The approach uses pathwise derivative estimators to optimize pass@K and max@K objectives, promoting exploration through policy-gradient landscape reshaping rather than explicit entropy bonuses, achieving performance comparable to SAC.

AINeutralarXiv – CS AI · Jun 46/10

🧠

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

Researchers present a theoretical framework for deep reinforcement learning in continuous environments using continuous-time stochastic processes and stochastic control theory. The work establishes a two time-scale model for actor-critic algorithms with neural networks, deriving equations that describe how state distributions evolve during training in the infinite width limit.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

Researchers propose simplicial embeddings, a lightweight geometric technique that constrains neural network representations to discrete, sparse structures, improving sample efficiency in reinforcement learning agents. When integrated into popular actor-critic algorithms like PPO and FastTD3, the method enhances performance and learning speed across diverse control tasks without sacrificing computational speed.

AINeutralarXiv – CS AI · May 126/10

🧠

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

Researchers propose a marginalized reparameterization (MRP) estimator to enable practical use of mixture policies in reinforcement learning, addressing a long-standing gap between theoretical potential and practical implementation. By reducing variance compared to likelihood-ratio methods, MRP mixture policies achieve performance parity with standard Gaussian policies while offering greater flexibility in continuous action spaces.

🏢 Google

AINeutralarXiv – CS AI · May 96/10

🧠

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

AdaGamma introduces a state-dependent discount factor method for deep reinforcement learning that learns to adjust discounting dynamically across different states, addressing instability issues in prior approaches through a return-consistency regularization objective. The method demonstrates empirical improvements when integrated into popular algorithms like SAC and PPO, with validated gains from real-world logistics deployment.

AIBullisharXiv – CS AI · Mar 176/10

🧠

XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning

Researchers introduce XQC, a deep reinforcement learning algorithm that achieves state-of-the-art sample efficiency by optimizing the critic network's condition number through batch normalization, weight normalization, and distributional cross-entropy loss. The method outperforms existing approaches across 70 continuous control tasks while using fewer parameters.

AIBullisharXiv – CS AI · Mar 36/104

🧠

FAuNO: Semi-Asynchronous Federated Reinforcement Learning Framework for Task Offloading in Edge Systems

Researchers have developed FAuNO, a new federated reinforcement learning framework that uses asynchronous processing to optimize task distribution in edge computing networks. The system employs an actor-critic architecture where local nodes learn specific dynamics while a central critic coordinates overall system performance, demonstrating superior results in reducing latency and task loss compared to existing methods.

AINeutralarXiv – CS AI · Mar 36/104

🧠

Distributions as Actions: A Unified Framework for Diverse Action Spaces

Researchers introduce a new reinforcement learning framework called Distributions-as-Actions (DA) that treats parameterized action distributions as actions, making all action spaces continuous regardless of original type. The approach includes a new policy gradient estimator (DA-PG) with lower variance and a practical actor-critic algorithm (DA-AC) that shows competitive performance across discrete, continuous, and hybrid control tasks.

AIBullisharXiv – CS AI · Mar 27/1016

🧠

SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer

Researchers developed Score Matched Actor-Critic (SMAC), a new offline reinforcement learning method that enables smooth transition to online RL algorithms without performance drops. SMAC achieved successful transfer in all 6 D4RL tasks tested and reduced regret by 34-58% in 4 of 6 environments compared to best baselines.

AINeutralOpenAI News · Oct 184/105

🧠

Asymmetric actor critic for image-based robot learning

The article appears to discuss asymmetric actor critic methods for image-based robot learning, focusing on reinforcement learning approaches for robotic systems. However, the article body is empty, preventing detailed analysis of the specific methodology or findings.

AINeutralHugging Face Blog · Jul 222/107

🧠

Advantage Actor Critic (A2C)

The article appears to be incomplete or missing content, with only the title 'Advantage Actor Critic (A2C)' provided. A2C is a reinforcement learning algorithm that combines value-based and policy-based methods, commonly used in AI applications including trading and optimization.