← Back to Computation and Language cs.CL
Why twenty years of Arabic NLP taught lessons about people, not language
Wajdi Zaghouani
May 20, 2026
Wajdi Zaghouani reflects on two decades of constructing Arabic NLP resources and infrastructure, from foundational linguistic datasets to social media analysis tools. He identifies three counterintuitive lessons: dataset creation is fundamentally a social process, communities matter more than individual tasks, and traditional NLP training leaves practitioners unprepared for real-world deployment challenges. Three high-profile failures—a depression detection corpus that never reached clinical use, overextension across shared tasks, and the false assumption that Modern Standard Arabic resources would transfer to dialects—reveal that the hardest problems in serving underserved languages are not technical but social and institutional.
Read the original paper →