← Back to Computation and Language
cs.CL

Benchmarking medical AI vision models in Bangla

Rafid Ahmed, Intesar Tahmid, Mir Sazzat Hossain, Tasnimul Hossain Tomal, Md Fahim, Md Farhad Alam Bhuiyan

May 18, 2026

BanglaMedVQA is the first medical visual question-answering benchmark for Bangla, a language spoken by hundreds of millions globally. The dataset contains clinically validated image-question-answer pairs evaluated against leading foundation models including GPT-4.1 mini, Gemini, and open-source alternatives like Gemma-3. Results show all tested models struggle significantly with fine-grained diagnostic reasoning in Bangla—even top performers fail on specialized medical questions. This work documents a critical performance gap between English and Bangla medical AI capabilities and establishes a baseline for future multilingual medical AI research.
Published as How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking arXiv:2605.18111
Read the original paper →