← Back to Computation and Language cs.CL
Do language models understand legal reasoning across countries and languages?
Volodymyr Ovcharov
May 28, 2026
Legal AI benchmarks typically test one language or mix incomparable tasks, making it impossible to compare models fairly across borders. This work introduces Multi-Legal-Bench, evaluating 7 frontier LLMs on five legal tasks—case outcome prediction, norm extraction, judgment classification—across Ukraine, France, Netherlands, Poland, Czech Republic, and Lithuania. Surprisingly, cross-lingual transfer quality depends more on whether task labels align than on whether languages are related: Ukrainian→French (different families) transfers better than Ukrainian→Polish (both Slavic). The authors release all data and predictions, offering practitioners a testbed for building multilingual legal AI.
Read the original paper →