y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#policy-learning News & Analysis

19 articles tagged with #policy-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

19 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

Offline Reinforcement Learning with Generative Trajectory Policies

Researchers propose Generative Trajectory Policies (GTPs), a unified framework for offline reinforcement learning that bridges the performance gap between slow diffusion models and fast consistency policies by learning continuous-time generative trajectories. The approach achieves state-of-the-art results on D4RL benchmarks, including perfect scores on difficult AntMaze tasks.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

HumanEgo is a new AI framework that enables robots to learn manipulation tasks directly from human egocentric videos without requiring robot-specific training data. The system achieves 92.5% success on real-world tasks using just 30 minutes of human video per task and transfers zero-shot across different robot hardware, cameras, and environments.

AIBullisharXiv – CS AI · 4d ago7/10
🧠

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Researchers introduce FineVLA, a framework that enhances Vision-Language-Action models for robotics by incorporating fine-grained instruction supervision beyond simple goal-level commands. The system combines 972,247 trajectories into a curated dataset of 47,159 fine-grained trajectories and demonstrates that mixing fine-grained and coarse instructions improves real-world robot manipulation success rates to 62.7% compared to 49.9% with goal-level instructions alone.

AIBullisharXiv – CS AI · May 127/10
🧠

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

NanoResearch introduces a multi-agent LLM framework that personalizes research automation through three co-evolving components: a skill bank for reusable procedural knowledge, a memory module for user-specific experience, and label-free policy learning for preference internalization. The system addresses the gap between uniform AI outputs and diverse researcher needs, demonstrating substantial improvements over existing AI research systems while reducing costs across successive cycles.

AIBullisharXiv – CS AI · May 117/10
🧠

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

Researchers introduce HCL-GP, a machine learning approach that enables large language model agents to learn and reuse hierarchical task decompositions for improved performance on complex applications. The method achieves 98.2% accuracy on standard tasks and demonstrates significant improvements over static synthesis approaches, particularly benefiting open-source models through dynamic component reuse.

AIBullisharXiv – CS AI · May 97/10
🧠

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Researchers introduce BEACON, a milestone-guided policy learning framework that significantly improves training efficiency for long-horizon language agents by solving credit misattribution and sample inefficiency problems. The approach achieves 92.9% success rates on complex tasks—nearly double previous benchmarks—while improving sample utilization from 23.7% to 82.0%.

AIBullisharXiv – CS AI · May 77/10
🧠

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Researchers introduce Q2RL, a novel algorithm that combines behavior cloning with reinforcement learning to enable robots to improve their policies through online interaction. The method uses Q-value estimation and gating mechanisms to prevent policy degradation from distribution mismatch, achieving 100% success rates on complex manipulation tasks in 1-2 hours of real robot learning.

AIBullisharXiv – CS AI · Mar 57/10
🧠

VITA: Vision-to-Action Flow Matching Policy

Researchers developed VITA, a new AI framework that streamlines robot policy learning by directly flowing from visual inputs to actions without requiring conditioning modules. The system achieves 1.5-2x faster inference speeds while maintaining or improving performance compared to existing methods across 14 simulation and real-world robotic tasks.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

AgensFlow is an open-source framework that treats multi-agent LLM coordination as a learnable policy problem rather than a fixed pipeline, enabling dynamic routing decisions across skill protocols, agent roles, and model bindings. Evaluated on distributed systems and security tasks, the framework demonstrates that learned coordination outperforms static designs while reducing exploration costs through warm-started policy graphs.

AINeutralarXiv – CS AI · May 116/10
🧠

Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation

Researchers identify a critical flaw in robotic manipulation training: collecting diverse single-shot demonstrations paradoxically degrades performance due to estimation noise. Their proposed Anchor-Centric Adaptation (ACA) framework prioritizes repeated demonstrations at core tasks before expanding coverage, significantly improving robot reliability under strict data budgets.

AINeutralarXiv – CS AI · May 116/10
🧠

TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning

Researchers introduced TAVIS, a comprehensive benchmark for evaluating active vision in imitation learning systems where robotic policies control their own gaze during manipulation tasks. The benchmark includes evaluation protocols, a novel metric (GALT) measuring anticipatory gaze, and baseline experiments showing that active vision benefits are task-dependent rather than universally beneficial.

🏢 Hugging Face
AINeutralarXiv – CS AI · Apr 156/10
🧠

Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents

Researchers introduce Aethelgard, an adaptive governance framework that addresses the capability overprovisioning problem in autonomous AI agents by dynamically restricting tool access based on task requirements. The system uses reinforcement learning to enforce least-privilege principles, reducing security exposure while maintaining operational efficiency.

AINeutralarXiv – CS AI · Apr 155/10
🧠

Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

Researchers introduce Hybrid-AIRL, an enhanced inverse reinforcement learning framework that combines adversarial learning with supervised expert guidance to improve reward function inference in complex, imperfect-information environments like poker. The method demonstrates superior sample efficiency and learning stability compared to traditional AIRL, particularly in settings with sparse and delayed rewards.

AIBullisharXiv – CS AI · Mar 165/10
🧠

Accelerating Residual Reinforcement Learning with Uncertainty Estimation

Researchers developed an improved Residual Reinforcement Learning method that uses uncertainty estimation to enhance sample efficiency and work with stochastic base policies. The approach outperformed existing methods in simulation benchmarks and demonstrated successful zero-shot sim-to-real transfer in real-world deployments.

AINeutralarXiv – CS AI · Mar 34/104
🧠

Embedding Morphology into Transformers for Cross-Robot Policy Learning

Researchers developed an embodiment-aware transformer policy that improves cross-robot policy learning by injecting morphological information through kinematic tokens, topology-aware attention, and joint-attribute conditioning. This approach consistently outperforms baseline vision-language-action models across multiple robot embodiments.

AINeutralarXiv – CS AI · Mar 24/105
🧠

Bridging Dynamics Gaps via Diffusion Schr\"odinger Bridge for Cross-Domain Reinforcement Learning

Researchers propose BDGxRL, a novel framework using Diffusion Schrödinger Bridge to enable reinforcement learning agents to transfer policies across different domains without direct target environment access. The method aligns source domain transitions with target dynamics through offline demonstrations and introduces reward modulation for consistent learning.

AINeutralOpenAI News · Jun 171/107
🧠

Learning policy representations in multiagent systems

The article title references learning policy representations in multiagent systems, which relates to AI research in multi-agent reinforcement learning. However, no article body content was provided for analysis.