Do language models understand legal reasoning across countries and languages?

Legal AI benchmarks typically test one language or mix incomparable tasks, making it impossible to compare models fairly across borders. This work introduces Multi-Legal-Bench, evaluating 7 frontier LLMs on five legal tasks—case outcome prediction, norm extraction, judgment classification—across Ukraine, France, Netherlands, Poland, Czech Republic, and Lithuania. Surprisingly, cross-lingual transfer quality depends more on whether task labels align than on whether languages are related: Ukrainian→French (different families) transfers better than Ukrainian→Polish (both Slavic). The authors release all data and predictions, offering practitioners a testbed for building multilingual legal AI.