511 articles tagged with #reinforcement-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 66/10
๐ง Researchers propose VISA (Value Injection via Shielded Adaptation), a new framework for aligning Large Language Models with human values while avoiding the 'alignment tax' that causes knowledge drift and hallucinations. The system uses a closed-loop architecture with value detection, translation, and rewriting components, demonstrating superior performance over standard fine-tuning methods and GPT-4o in maintaining factual consistency.
๐ง GPT-4
AINeutralarXiv โ CS AI ยท Mar 67/10
๐ง Researchers introduce BioLLMAgent, a hybrid framework combining reinforcement learning models with large language models to simulate human decision-making in computational psychiatry. The framework demonstrates strong interpretability while accurately reproducing human behavioral patterns and successfully simulating cognitive behavioral therapy principles.
AIBullisharXiv โ CS AI ยท Mar 67/10
๐ง Researchers present KARL, a reinforcement learning system for training enterprise search agents that outperforms GPT 5.2 and Claude 4.6 on diverse search tasks. The system introduces KARLBench evaluation suite and demonstrates superior cost-quality trade-offs through multi-task training and synthetic data generation.
๐ง GPT-5๐ง Claude
AIBullisharXiv โ CS AI ยท Mar 67/10
๐ง WebFactory introduces a fully automated reinforcement learning pipeline that efficiently transforms large language models into GUI agents without requiring unsafe live web interactions or costly human-annotated data. The system demonstrates exceptional data efficiency by achieving comparable performance to human-trained agents while using synthetic data from only 10 websites.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.
๐ข Hugging Face๐ง GPT-4
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers developed a bio-inspired whole-body control system (IO-WBC) for humanoid robots that enables stable object transport in unstructured environments. The system separates upper-body interaction control from lower-body balance control and uses reinforcement learning to handle heavy loads and disturbances.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers introduce SHE (Stepwise Hybrid Examination), a new reinforcement learning framework that improves AI-powered e-commerce search relevance prediction. The framework addresses limitations in existing training methods by using step-level rewards and hybrid verification to enhance both accuracy and interpretability of search results.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed COREA, a system that combines small and large language models to reduce AI reasoning costs by 21.5% while maintaining nearly identical accuracy. The system uses confidence scoring to decide when to escalate questions from cheaper small models to more expensive large models.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers propose MIND, a reinforcement learning framework that improves AI-powered psychiatric consultation by addressing key challenges in diagnostic accuracy and clinical reasoning. The system uses a Criteria-Grounded Psychiatric Reasoning Bank to provide better clinical support and reduce inquiry drift during multi-turn patient interactions.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers developed PhyPrompt, a reinforcement learning framework that automatically refines text prompts to generate physically realistic videos from AI models. The system uses a two-stage approach with curriculum learning to improve both physical accuracy and semantic fidelity, outperforming larger models like GPT-4o with only 7B parameters.
๐ง GPT-4
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers introduce Vision-Zero, a self-improving AI framework that trains vision-language models through competitive games without requiring human-labeled data. The system uses strategic self-play and can work with arbitrary images, achieving state-of-the-art performance on reasoning and visual understanding tasks while reducing training costs.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.
๐ง GPT-4
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers developed CES, a multi-agent framework using reinforcement learning to improve GUI automation for long-horizon tasks. The system uses a Coordinator for planning, State Tracker for context management, and can integrate with any low-level Executor model to significantly enhance performance on complex automated tasks.
AINeutralarXiv โ CS AI ยท Mar 57/10
๐ง Researchers propose ALTERNATING-MARL, a new framework for cooperative multi-agent reinforcement learning that enables a global agent to learn with massive populations under communication constraints. The method achieves approximate Nash equilibrium convergence while only observing a subset of local agent states, with applications in multi-robot control and federated optimization.
$MKR
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers demonstrate that flow matching improves reinforcement learning through enhanced TD learning mechanisms rather than distributional modeling. The approach achieves 2x better final performance and 5x improved sample efficiency compared to standard critics by enabling test-time error recovery and more plastic feature learning.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง MemSifter is a new AI framework that uses smaller proxy models to handle memory retrieval for large language models, addressing computational costs in long-term memory tasks. The system uses reinforcement learning to optimize retrieval accuracy and has been open-sourced with demonstrated performance improvements on benchmark tests.
AIBearisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers demonstrate a novel backdoor attack method called 'SFT-then-GRPO' that can inject hidden malicious behavior into AI agents while maintaining their performance on standard benchmarks. The attack creates 'sleeper agents' that appear benign but can execute harmful actions under specific trigger conditions, highlighting critical security vulnerabilities in the adoption of third-party AI models.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers have developed Phys4D, a new pipeline that enhances video diffusion models with physics-consistent 4D world representations through a three-stage training process. The system addresses current limitations where AI-generated videos often exhibit physically implausible dynamics, using pseudo-supervised pretraining, physics-grounded fine-tuning, and reinforcement learning to improve spatiotemporal consistency.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers have developed Sim2Sea, a comprehensive framework that successfully bridges the simulation-to-reality gap for autonomous maritime vessel navigation in congested waters. The system uses GPU-accelerated parallel simulation, dual-stream spatiotemporal policy, and targeted domain randomization to achieve zero-shot transfer from simulation to real-world deployment on a 17-ton unmanned vessel.
AINeutralarXiv โ CS AI ยท Mar 57/10
๐ง Researchers studied reinforcement learning with verifiable rewards (RLVR) for training large language models on causal reasoning tasks, finding it outperforms supervised fine-tuning but only when models have sufficient initial competence. The study used causal graphical models as a testbed and showed RLVR improves specific reasoning subskills like marginalization strategy and probability calculations.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers demonstrate that multi-agent competitive training enables AI agents to develop agile flight capabilities and strategic behaviors that outperform traditional single-agent training methods. The approach shows superior sim-to-real transfer and generalization when applied to drone racing scenarios with complex environments and obstacles.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed a new AI training method using knowledge graphs as reward models to improve compositional reasoning in specialized domains. The approach enables smaller 14B parameter models to outperform much larger frontier systems like GPT-5.2 and Gemini 3 Pro on complex multi-hop reasoning tasks in medicine.
๐ง Gemini
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed DMAST, a new training framework that protects multimodal web agents from cross-modal attacks where adversaries inject malicious content into webpages to deceive both visual and text processing channels. The method uses adversarial training through a three-stage pipeline and significantly outperforms existing defenses while doubling task completion efficiency.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers developed a new three-layer hierarchy called cognition-to-control (C2C) for human-robot collaboration that combines vision-language models with multi-agent reinforcement learning. The system enables sustained deliberation and planning while maintaining real-time control for collaborative manipulation tasks between humans and humanoid robots.
AINeutralarXiv โ CS AI ยท Mar 57/10
๐ง Researchers propose SaFeR, a new AI system for generating safety-critical scenarios to test autonomous driving systems. The approach uses transformer-based models with a novel resampling strategy to balance adversarial testing, physical feasibility, and realistic behavior in autonomous vehicle simulations.