AIBullisharXiv – CS AI · 6h ago7/10
🧠
Data Language Models: A New Foundation Model Class for Tabular Data
Researchers introduce Schema-1, the first Data Language Model (DLM) designed to natively understand tabular data without preprocessing, similar to how language models understand text. The 140M-parameter model trained on 2.3M datasets outperforms gradient-boosted trees, AutoML systems, and existing tabular foundation models on prediction benchmarks and demonstrates superior performance on missing value imputation and dataset classification tasks.