← Back to Computation and Language
cs.CL

Using speech audio to improve real-time MRI of vocal tracts

Md Hasan, Nyvenn Castro, Daiqi Liu, Lukas Mulzer, Jana Hutter, Jonghye Woo, Moritz Zaiss, Andreas Maier, Paula A. Perez-Toro

May 18, 2026

Real-time MRI of speech production trades spatial resolution, temporal resolution, and acquisition speed due to undersampled k-space data. SIREM combines speech audio and MRI measurements through learned spatial weighting, letting an audio branch predict articulator structure while an MRI branch reconstructs remaining details. The method also learns optimal weighting of k-space spiral arms. Evaluated on USC speech rtMRI data, SIREM matches or exceeds gridding, wavelet compressed sensing, and total variation methods while operating faster than iterative approaches and preserving anatomically plausible structures. Code is released.
Published as SIREM: Speech-Informed MRI Reconstruction with Learned Sampling arXiv:2605.18221
Read the original paper →