← Back to Machine Learning (Statistics) stat.ML
How to fix missing survey answers without inventing fake data
Yuyu Chen, Taehyo Kim, Hai Shu, Yang Feng
June 3, 2026
Large surveys have two types of blanks: cells left empty by design (skip patterns in branching questions) and genuine missing answers. TabSODA separates these using a diffusion model that treats ordinal responses (like "very satisfied" to "not at all") as ordered categories rather than arbitrary levels. On PATH and NSDUH surveys, it reduces error on ordinal variables by up to 24% and categorical accuracy by 9%, even when skip patterns must be inferred from questionnaire structure.
Read the original paper →