← Back to Machine Learning (Statistics)
stat.ML

How do we find similar data tables when they don't share column names?

M. Ross Kunz, John Merickel, Keith Wilson

May 28, 2026

Scientific tables often have completely different column names even when measuring similar things, making cross-dataset search nearly impossible. This work embeds tables into a shared space by analyzing their statistical descriptors (mean, variance, distribution shape) rather than raw numbers, then uses sparse correlation analysis to identify which statistical patterns link tables together. The approach works across 15 diverse datasets—from materials science to nuclear graphite studies—and supports differential privacy for sensitive data.
Published as Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets arXiv:2605.30289
Read the original paper →