Using speech audio to improve real-time MRI of vocal tracts

Md Hasan, Nyvenn Castro, Daiqi Liu, Lukas Mulzer, Jana Hutter, Jonghye Woo, Moritz Zaiss, Andreas Maier, Paula A. Perez-Toro

Real-time MRI of speech production trades spatial resolution, temporal resolution, and acquisition speed due to undersampled k-space data. SIREM combines speech audio and MRI measurements through learned spatial weighting, letting an audio branch predict articulator structure while an MRI branch reconstructs remaining details. The method also learns optimal weighting of k-space spiral arms. Evaluated on USC speech rtMRI data, SIREM matches or exceeds gridding, wavelet compressed sensing, and total variation methods while operating faster than iterative approaches and preserving anatomically plausible structures. Code is released.