← Back to Computer Vision
cs.CV

Can AI reason through anomalies without any training?

Yi Zhang, Jiawen Zhu, Lele Fu, Guansong Pang

May 28, 2026

Most anomaly detection systems rely on visual similarity scores and require training on large datasets. AnomalyAgent instead uses multimodal language models (like GPT-4V) as reasoning agents: given an image, the system deploys specialized tools to investigate anomalies and draws on memory of past examples to explain what's wrong. It works zero-shot (no training) and handles both simple defects and contextual anomalies in real manufacturing and logistics settings, outperforming similarity-based approaches.
Published as AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection arXiv:2605.30140
Read the original paper →