← Back to Machine Learning (Statistics) stat.ML
How do we find similar data tables when they don't share column names?
M. Ross Kunz, John Merickel, Keith Wilson
May 28, 2026
Scientific tables often have completely different column names even when measuring similar things, making cross-dataset search nearly impossible. This work embeds tables into a shared space by analyzing their statistical descriptors (mean, variance, distribution shape) rather than raw numbers, then uses sparse correlation analysis to identify which statistical patterns link tables together. The approach works across 15 diverse datasets—from materials science to nuclear graphite studies—and supports differential privacy for sensitive data.
Read the original paper →