Do AI agents actually remember what you told them?

Existing agent benchmarks test single sessions, but real users interact with AI assistants across weeks or months with evolving needs. Momento evaluates whether agents can remember past actions, stated preferences, and context while handling tool use across multiple sessions. Current agents struggle—they treat old session history as current fact rather than outdated information needing re-validation, revealing a gap between lab performance and realistic long-horizon interaction.