← Back to Machine Learning
cs.LG

Clinical AI systems fail on tiny perturbations and non-English languages

Anthonio Oladimeji Gabriel, Ahmad Rufai Yusuf

May 16, 2026

This study documents two critical safety failures in clinical AI systems deployed in low-resource settings: adversarial fragility and cross-lingual diagnostic drift. Researchers tested DenseNet121 (the backbone of CheXNet) on chest X-rays and large language models on COVID-19 cases, finding that tiny imperceptible image perturbations degrade accuracy drastically, while standard defenses like Gaussian smoothing and ensemble voting provide no protection. Language vulnerability is equally severe—both Llama3.1 and a purportedly African-context model (NatLAS) show 20–30 percentage point accuracy drops when processing clinical cases in Nigerian Pidgin or Yoruba-inflected English. Intended for healthcare practitioners and policy makers in Primary Health Centers in Nigeria and similar settings.
Published as Adversarial Fragility and Language Vulnerability in Clinical AI: A Systematic Audit of Diagnostic Collapse Under Imperceptible Perturbations and Cross-Lingual Drift in Low-Resource Healthcare Settings arXiv:2605.16993
Read the original paper →