← Back to Computation and Language
cs.CL

Teaching AI to remember its mistakes and improve faster

Peilin Wu, Xinlu Zhang, Kun Wan, Wentian Zhao, Gang Wu, Xinya Du, Zhiyu Chen

May 18, 2026

Rubric-based reinforcement learning fine-tunes language models using structured reward signals, but existing methods discard evaluation insights after each step. AMARIS adds a persistent memory system that retrieves relevant past feedback—both recent and semantically similar—to refine rubrics over time. The result: consistent improvements across coding, math, and creative writing tasks with only ~5% computational overhead, showing that accumulated training history beats stateless per-step heuristics.
Published as AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning arXiv:2605.18592
Read the original paper →