y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

arXiv – CS AI|Peilin Wu, Xinlu Zhang, Kun Wan, Wentian Zhao, Gang Wu, Xinya Du, Zhiyu Chen|
🤖AI Summary

AMARIS is a new system that improves how large language models are trained using reinforcement learning by maintaining a persistent memory of past training data and failures. Unlike existing methods that only look at immediate, local information, AMARIS tracks recurring problems and previous rubric adjustments over time, achieving measurable performance improvements across multiple domains.

Analysis

AMARIS addresses a fundamental limitation in current rubric-based reinforcement learning approaches for LLM fine-tuning. Traditional adaptive rubric systems operate with limited context, analyzing only the current batch or individual comparisons, which prevents them from identifying patterns across training sessions or evaluating the effectiveness of previous modifications. This myopic approach leads to oscillatory updates where rubrics are adjusted without understanding whether earlier changes actually solved problems or merely shifted them.

The system's innovation lies in its longitudinal memory architecture, which stores rollout analyses, step-level summaries, and historical rubric edits in a persistent evaluation memory. By retrieving semantically relevant history alongside recent training evidence, AMARIS enables more informed rubric revisions that account for the full trajectory of model development. This approach mirrors how human instructors learn—tracking student progress over time rather than reacting to isolated mistakes.

The empirical results demonstrate meaningful improvements across diverse domains including science benchmarks (GPQA-Diamond +2.8 points) and instruction-following tasks (IFBench +2.2 points), validating that memory-augmented approaches outperform both static and locally-adaptive baselines. The asynchronous implementation reduces computational overhead, a practical advantage for production environments.

For the broader AI training ecosystem, AMARIS represents progress toward more interpretable and efficient RL workflows. The ability to track failure patterns and progression from basic correction to curriculum advancement suggests potential applications beyond LLM fine-tuning, particularly in complex reinforcement learning scenarios where interpretability and systematic improvement matter.

Key Takeaways
  • AMARIS uses persistent memory to track training history, enabling better rubric improvements compared to methods that only analyze current batches
  • The system achieved +2.8 point improvements on GPQA-Diamond and +2.2 points on IFBench over strongest baseline methods
  • Memory-augmented rubric updates reduce oscillatory edits and support progression from early failure correction to advanced curriculum strategies
  • AMARIS runs asynchronously alongside standard RL loops, minimizing computational blocking relative to synchronous update methods
  • The approach addresses interpretability and editability challenges in reinforcement learning-based LLM fine-tuning across science, medicine, and creative writing domains
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles