y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#tabular-data News & Analysis

28 articles tagged with #tabular-data. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

28 articles
AIBullisharXiv – CS AI · May 117/10
🧠

Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning

Researchers introduce PIQL, a framework that leverages privileged information to accelerate training and improve generalization in tabular foundation models. By incorporating dataset-level statistics and encodings of data-generating processes during training, the approach reduces computational requirements and convergence time while maintaining inference efficiency through reconstruction mechanisms.

AIBullisharXiv – CS AI · May 117/10
🧠

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

Researchers propose a novel uncertainty quantification method for Prior-Data Fitted Networks (PFNs), emerging foundation models for tabular data prediction, using martingale posteriors to provide calibrated confidence estimates. The technique is tuning-free, computationally efficient, and mathematically proven to converge, addressing a significant limitation in PFNs' practical applicability.

AIBullisharXiv – CS AI · May 97/10
🧠

Data Language Models: A New Foundation Model Class for Tabular Data

Researchers introduce Schema-1, the first Data Language Model (DLM) designed to natively understand tabular data without preprocessing, similar to how language models understand text. The 140M-parameter model trained on 2.3M datasets outperforms gradient-boosted trees, AutoML systems, and existing tabular foundation models on prediction benchmarks and demonstrates superior performance on missing value imputation and dataset classification tasks.

AIBullisharXiv – CS AI · Apr 157/10
🧠

Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.

AINeutralarXiv – CS AI · Apr 107/10
🧠

OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale

OmniTabBench introduces the largest tabular data benchmark with 3,030 datasets to evaluate gradient boosted decision trees, neural networks, and foundation models. The comprehensive analysis reveals no universally superior approach, but identifies specific conditions favoring different model categories through decoupled metafeature analysis.

AIBullisharXiv – CS AI · Mar 117/10
🧠

Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors

Researchers propose PRPO (Permutation Relative Policy Optimization), a reinforcement learning framework that enhances large language models' numerical reasoning capabilities for tabular data prediction. The method achieves performance comparable to supervised baselines while excelling in zero-shot scenarios, with an 8B parameter model outperforming much larger models by up to 53.17%.

AIBullisharXiv – CS AI · Mar 57/10
🧠

SPRINT: Semi-supervised Prototypical Representation for Few-Shot Class-Incremental Tabular Learning

Researchers introduce SPRINT, the first Few-Shot Class-Incremental Learning (FSCIL) framework designed specifically for tabular data domains like cybersecurity and healthcare. The system achieves 77.37% accuracy in 5-shot learning scenarios, outperforming existing methods by 4.45% through novel semi-supervised techniques that leverage unlabeled data and confidence-based pseudo-labeling.

AIBullisharXiv – CS AI · Mar 46/103
🧠

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Researchers introduce MedFeat, a new AI framework that uses Large Language Models for healthcare feature engineering in clinical tabular predictions. The system incorporates model awareness and domain knowledge to discover clinically meaningful features that outperform traditional approaches and demonstrate robustness across different hospital settings.

AINeutralarXiv – CS AI · 18h ago5/10
🧠

TabChange: Precise Attribute Changes in Tabular Data

TabChange is a new machine learning approach for modifying individual attributes in tabular datasets while maintaining data naturalness and minimizing unintended changes. The method analyzes attribute relationships and uses adversarial techniques to remove latent information about target attributes, producing more valid counterfactuals than existing generative models.

AINeutralarXiv – CS AI · 18h ago6/10
🧠

ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning

Researchers introduce ODTQA-FoRe, a new dataset and TimeFore framework enabling large language models to perform future-oriented numerical predictions on tabular data using time-series forecasting. The innovation addresses a critical gap where existing LLM systems excel at historical analysis but struggle with predictive reasoning, demonstrated through real estate data scenarios.

AINeutralarXiv – CS AI · 18h ago5/10
🧠

Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution

Researchers introduce DhondtXAI, a novel explainable AI framework for tabular data that uses proportional representation principles (the D'Hondt rule) to attribute feature importance instead of relying on SHAP values. The method demonstrates high correlation with SHAP while offering complementary capabilities for handling feature interactions and alliances, validated across synthetic tests and healthcare datasets.

AINeutralarXiv – CS AI · 18h ago6/10
🧠

Learning-To-Measure: In-Context Active Feature Acquisition

Researchers introduce Learning-to-Measure (L2M), a meta-learning framework that enables AI systems to learn optimal feature acquisition strategies across multiple tasks without task-specific retraining. The approach combines uncertainty quantification with a greedy acquisition agent, demonstrating superior performance on tabular datasets with missing features and limited labels.

AINeutralarXiv – CS AI · 18h ago6/10
🧠

From Noise to Order: Learning to Rank via Denoising Diffusion

Researchers propose DiffusionRank, a generative deep learning approach to learning-to-rank in information retrieval that uses denoising diffusion models instead of traditional discriminative methods. By modeling the full joint distribution of features and relevance labels, the method demonstrates improvements over classical ranking approaches on standard benchmarks.

AINeutralarXiv – CS AI · 18h ago6/10
🧠

Avoiding Structural Failure Modes in Tabular Fair SSL: Online Primal-Dual Allocation under Confidence Gating

Researchers identify critical failure modes in semi-supervised learning (SSL) applied to tabular data with fairness constraints, where fairness regularizers can paradoxically erode model performance. They propose Online Primal-Dual Allocation (OPDA), an adaptive controller that dynamically balances fairness and stability penalties without manual tuning, demonstrating improved robustness across benchmark datasets like Adult, COMPAS, and ACSIncome.

🏢 Meta
AINeutralarXiv – CS AI · 4d ago6/10
🧠

Masked Diffusion Modeling for Anomaly Detection

Researchers propose MaskDiff-AD, a novel anomaly detection method using masked diffusion models that operates on categorical and discrete data without requiring reverse-time sampling. The approach demonstrates competitive or superior performance compared to existing anomaly detection baselines across tabular and text datasets.

AINeutralarXiv – CS AI · 5d ago6/10
🧠

BIRDNet: Mining and Encoding Boolean Implication Knowledge Graphs as Interpretable Deep Neural Networks

Researchers introduce BIRDNet, a neurosymbolic deep learning architecture that mines Boolean implication relationships from tabular data and encodes them as sparse, interpretable neural networks. The model achieves near-baseline performance on biomedical datasets while using 96× fewer active parameters and maintaining human-readable symbolic rules without external rule bases.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice

Researchers propose a two-stage adapter that constrains tabular foundation model predictions within economic theory frameworks, ensuring price-demand relationships remain logically consistent while recovering accuracy gains over standard choice models. The approach achieves up to 13 percentage points of accuracy improvement on transportation datasets while guaranteeing economic validity—a problem raw foundation models fail to solve.

AINeutralarXiv – CS AI · 6d ago5/10
🧠

Conceptual Schema Inference for Tabular Datasets using Large Language Models

Researchers propose LLM-based approaches (GeSI and EmSI) to automatically infer conceptual schemas from heterogeneous tabular datasets by analyzing column headers and cell values. The methods address the challenge of organizing large, inconsistent data collections from diverse sources by deriving entity types, attributes, and relationships without manual intervention.

AINeutralarXiv – CS AI · May 116/10
🧠

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

Researchers introduce K-DSM, a kurtosis-based noise scaling method for denoising score matching that improves tabular anomaly detection without additional model complexity. The approach achieves state-of-the-art performance by adaptively setting noise levels per feature based on marginal distribution shape, reducing hyperparameter tuning burden in scenarios where anomalies are unknown.

AINeutralarXiv – CS AI · May 16/10
🧠

CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations

Researchers developed CoAX, a cognitive modeling framework that analyzes how users understand and interpret AI explanations (XAI) when making decisions about tabular data. By studying human reasoning strategies across different explanation methods, the team found that cognitive models better predict human decision-making than traditional machine learning proxies, offering insights to improve the design of more usable AI explanations.

AINeutralarXiv – CS AI · May 16/10
🧠

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

Researchers introduce TopBench, a benchmark dataset of 779 samples designed to evaluate how well Large Language Models handle implicit prediction tasks over tabular data—queries requiring inference from historical patterns rather than simple data retrieval. Testing reveals current LLMs struggle with intent recognition and default to lookup-based approaches, indicating that accurate intent disambiguation is critical before predictive reasoning can succeed.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering

Researchers introduce TagCC, a novel deep clustering framework that combines Large Language Models with contrastive learning to enhance tabular data analysis by incorporating semantic knowledge from feature names and values. The approach bridges the gap between statistical co-occurrence patterns and intrinsic semantic understanding, demonstrating significant performance improvements over existing methods in finance and healthcare applications.

AINeutralarXiv – CS AI · Mar 176/10
🧠

A Closer Look into LLMs for Table Understanding

Researchers conducted an empirical study on 16 Large Language Models to understand how they process tabular data, revealing a three-phase attention pattern and finding that tabular tasks require deeper neural network layers than math reasoning. The study analyzed attention dynamics, layer depth requirements, expert activation in MoE models, and the impact of different input designs on table understanding performance.

AIBullisharXiv – CS AI · Mar 36/105
🧠

GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics

Researchers introduced GateLens, an LLM-based system that uses Relational Algebra as an intermediate layer to analyze complex tabular data more reliably than traditional approaches. The system demonstrated over 80% reduction in analysis time in automotive software analytics while maintaining high accuracy, outperforming existing Chain-of-Thought methods.

Page 1 of 2Next →