← Back to Machine Learning (Statistics)
stat.ML

How to fix missing survey answers without inventing fake data

Yuyu Chen, Taehyo Kim, Hai Shu, Yang Feng

June 3, 2026

Large surveys have two types of blanks: cells left empty by design (skip patterns in branching questions) and genuine missing answers. TabSODA separates these using a diffusion model that treats ordinal responses (like "very satisfied" to "not at all") as ordered categories rather than arbitrary levels. On PATH and NSDUH surveys, it reduces error on ordinal variables by up to 24% and categorical accuracy by 9%, even when skip patterns must be inferred from questionnaire structure.
Published as TabSODA: Tabular Diffusion based Imputation with Skip Pattern Detection and Ordinal Awareness arXiv:2606.05361
Read the original paper →