y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#tabular-data News & Analysis

12 articles tagged with #tabular-data. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AIBullisharXiv โ€“ CS AI ยท 3d ago7/10
๐Ÿง 

Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.

AINeutralarXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale

OmniTabBench introduces the largest tabular data benchmark with 3,030 datasets to evaluate gradient boosted decision trees, neural networks, and foundation models. The comprehensive analysis reveals no universally superior approach, but identifies specific conditions favoring different model categories through decoupled metafeature analysis.

AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors

Researchers propose PRPO (Permutation Relative Policy Optimization), a reinforcement learning framework that enhances large language models' numerical reasoning capabilities for tabular data prediction. The method achieves performance comparable to supervised baselines while excelling in zero-shot scenarios, with an 8B parameter model outperforming much larger models by up to 53.17%.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

SPRINT: Semi-supervised Prototypical Representation for Few-Shot Class-Incremental Tabular Learning

Researchers introduce SPRINT, the first Few-Shot Class-Incremental Learning (FSCIL) framework designed specifically for tabular data domains like cybersecurity and healthcare. The system achieves 77.37% accuracy in 5-shot learning scenarios, outperforming existing methods by 4.45% through novel semi-supervised techniques that leverage unlabeled data and confidence-based pseudo-labeling.

AIBullisharXiv โ€“ CS AI ยท Mar 46/103
๐Ÿง 

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Researchers introduce MedFeat, a new AI framework that uses Large Language Models for healthcare feature engineering in clinical tabular predictions. The system incorporates model awareness and domain knowledge to discover clinically meaningful features that outperform traditional approaches and demonstrate robustness across different hospital settings.

AINeutralarXiv โ€“ CS AI ยท 4d ago6/10
๐Ÿง 

Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering

Researchers introduce TagCC, a novel deep clustering framework that combines Large Language Models with contrastive learning to enhance tabular data analysis by incorporating semantic knowledge from feature names and values. The approach bridges the gap between statistical co-occurrence patterns and intrinsic semantic understanding, demonstrating significant performance improvements over existing methods in finance and healthcare applications.

AINeutralarXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

A Closer Look into LLMs for Table Understanding

Researchers conducted an empirical study on 16 Large Language Models to understand how they process tabular data, revealing a three-phase attention pattern and finding that tabular tasks require deeper neural network layers than math reasoning. The study analyzed attention dynamics, layer depth requirements, expert activation in MoE models, and the impact of different input designs on table understanding performance.

AIBullisharXiv โ€“ CS AI ยท Mar 36/105
๐Ÿง 

GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics

Researchers introduced GateLens, an LLM-based system that uses Relational Algebra as an intermediate layer to analyze complex tabular data more reliably than traditional approaches. The system demonstrated over 80% reduction in analysis time in automotive software analytics while maintaining high accuracy, outperforming existing Chain-of-Thought methods.

AINeutralarXiv โ€“ CS AI ยท Mar 95/10
๐Ÿง 

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

Researchers introduced TML-Bench, a new benchmark for evaluating AI coding agents on tabular machine learning tasks similar to Kaggle competitions. The study tested 10 open-source language models across four competitions with different time budgets, finding that MiniMax-M2.1 achieved the best overall performance.

AINeutralarXiv โ€“ CS AI ยท Feb 274/107
๐Ÿง 

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

Researchers developed smooth-basis regression models including anisotropic RBF networks and Chebyshev polynomial regressors that compete with tree ensembles in tabular regression tasks. Testing across 55 datasets showed these models achieve similar accuracy to tree ensembles while offering better generalization properties and gradual prediction surfaces suitable for optimization applications.

AINeutralarXiv โ€“ CS AI ยท Feb 274/103
๐Ÿง 

TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion

Researchers introduce TabDLM, a new AI framework that generates synthetic tabular data containing both numerical values and free-form text using joint numerical-language diffusion models. The approach addresses limitations of existing diffusion and LLM-based methods by combining masked diffusion for text with continuous diffusion for numbers, enabling better synthetic data generation for privacy and data augmentation applications.