← Back to Computation and Language cs.CL
Teaching AI to remember its mistakes and improve faster
Peilin Wu, Xinlu Zhang, Kun Wan, Wentian Zhao, Gang Wu, Xinya Du, Zhiyu Chen
May 18, 2026
Rubric-based reinforcement learning fine-tunes language models using structured reward signals, but existing methods discard evaluation insights after each step. AMARIS adds a persistent memory system that retrieves relevant past feedback—both recent and semantically similar—to refine rubrics over time. The result: consistent improvements across coding, math, and creative writing tasks with only ~5% computational overhead, showing that accumulated training history beats stateless per-step heuristics.
Read the original paper →