Can AI reason through anomalies without any training?

Most anomaly detection systems rely on visual similarity scores and require training on large datasets. AnomalyAgent instead uses multimodal language models (like GPT-4V) as reasoning agents: given an image, the system deploys specialized tools to investigate anomalies and draws on memory of past examples to explain what's wrong. It works zero-shot (no training) and handles both simple defects and contextual anomalies in real manufacturing and logistics settings, outperforming similarity-based approaches.