← Back to Machine Learning (Statistics)
stat.ML

Can synthetic data from AI improve statistical inference without full models?

Jiguang Li, Sid Kankanala, Veronika Rockova

May 29, 2026

When you know the relationships your data should satisfy (moment conditions) but not the full probability model, standard inference breaks down. This work builds a Bayesian framework around empirical likelihood—assigning weights to observed data so sample moments match exactly—and extends it to incorporate synthetic data from generative AI as regularization. The method projects posterior draws onto the moment constraints, stays computationally tractable, and comes with theoretical convergence guarantees. In stock prediction from news headlines, AI-generated auxiliary data improved performance when domain-specific parameter priors were unavailable.
Published as Empirical Likelihood with Generative AI arXiv:2606.00425
Read the original paper →