← Back to Computation and Language cs.CL
Using speech audio to improve real-time MRI of vocal tracts
Md Hasan, Nyvenn Castro, Daiqi Liu, Lukas Mulzer, Jana Hutter, Jonghye Woo, Moritz Zaiss, Andreas Maier, Paula A. Perez-Toro
May 18, 2026
Real-time MRI of speech production trades spatial resolution, temporal resolution, and acquisition speed due to undersampled k-space data. SIREM combines speech audio and MRI measurements through learned spatial weighting, letting an audio branch predict articulator structure while an MRI branch reconstructs remaining details. The method also learns optimal weighting of k-space spiral arms. Evaluated on USC speech rtMRI data, SIREM matches or exceeds gridding, wavelet compressed sensing, and total variation methods while operating faster than iterative approaches and preserving anatomically plausible structures. Code is released.
Read the original paper →