← Back to Computer Vision
cs.CV

How can AI see high-resolution images without drowning in pixels?

Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang, Bin Chen, Ke Chen, Yaowei Wang

May 22, 2026

Multimodal AI models struggle with high-resolution images because standard approaches either miss details or waste computation on irrelevant patches. CVSearch intelligently switches between two strategies: first trying expert-guided visual search, then falling back to semantic-aware scanning that groups similar image regions together rather than rigid grid divisions. A complexity-driven bottom-up search then efficiently explores remaining details. On standard HR benchmarks, it matches best prior accuracy while cutting computational overhead substantially.
Published as CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception arXiv:2605.23655
Read the original paper →