← Back to Computer Vision cs.CV
Can AI reason through anomalies without any training?
Yi Zhang, Jiawen Zhu, Lele Fu, Guansong Pang
May 28, 2026
Most anomaly detection systems rely on visual similarity scores and require training on large datasets. AnomalyAgent instead uses multimodal language models (like GPT-4V) as reasoning agents: given an image, the system deploys specialized tools to investigate anomalies and draws on memory of past examples to explain what's wrong. It works zero-shot (no training) and handles both simple defects and contextual anomalies in real manufacturing and logistics settings, outperforming similarity-based approaches.
Read the original paper →