#tabular-data News & Analysis

44 articles tagged with #tabular-data. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

44 articles

AINeutralarXiv – CS AI · Jun 26/10

🧠

ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning

Researchers introduce ODTQA-FoRe, a new dataset and TimeFore framework enabling large language models to perform future-oriented numerical predictions on tabular data using time-series forecasting. The innovation addresses a critical gap where existing LLM systems excel at historical analysis but struggle with predictive reasoning, demonstrated through real estate data scenarios.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution

Researchers introduce DhondtXAI, a novel explainable AI framework for tabular data that uses proportional representation principles (the D'Hondt rule) to attribute feature importance instead of relying on SHAP values. The method demonstrates high correlation with SHAP while offering complementary capabilities for handling feature interactions and alliances, validated across synthetic tests and healthcare datasets.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Learning-To-Measure: In-Context Active Feature Acquisition

Researchers introduce Learning-to-Measure (L2M), a meta-learning framework that enables AI systems to learn optimal feature acquisition strategies across multiple tasks without task-specific retraining. The approach combines uncertainty quantification with a greedy acquisition agent, demonstrating superior performance on tabular datasets with missing features and limited labels.

AINeutralarXiv – CS AI · Jun 26/10

🧠

From Noise to Order: Learning to Rank via Denoising Diffusion

Researchers propose DiffusionRank, a generative deep learning approach to learning-to-rank in information retrieval that uses denoising diffusion models instead of traditional discriminative methods. By modeling the full joint distribution of features and relevance labels, the method demonstrates improvements over classical ranking approaches on standard benchmarks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Avoiding Structural Failure Modes in Tabular Fair SSL: Online Primal-Dual Allocation under Confidence Gating

Researchers identify critical failure modes in semi-supervised learning (SSL) applied to tabular data with fairness constraints, where fairness regularizers can paradoxically erode model performance. They propose Online Primal-Dual Allocation (OPDA), an adaptive controller that dynamically balances fairness and stability penalties without manual tuning, demonstrating improved robustness across benchmark datasets like Adult, COMPAS, and ACSIncome.

🏢 Meta

AINeutralarXiv – CS AI · May 296/10

🧠

Masked Diffusion Modeling for Anomaly Detection

Researchers propose MaskDiff-AD, a novel anomaly detection method using masked diffusion models that operates on categorical and discrete data without requiring reverse-time sampling. The approach demonstrates competitive or superior performance compared to existing anomaly detection baselines across tabular and text datasets.

AINeutralarXiv – CS AI · May 286/10

🧠

BIRDNet: Mining and Encoding Boolean Implication Knowledge Graphs as Interpretable Deep Neural Networks

Researchers introduce BIRDNet, a neurosymbolic deep learning architecture that mines Boolean implication relationships from tabular data and encodes them as sparse, interpretable neural networks. The model achieves near-baseline performance on biomedical datasets while using 96× fewer active parameters and maintaining human-readable symbolic rules without external rule bases.

AINeutralarXiv – CS AI · May 276/10

🧠

Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice

Researchers propose a two-stage adapter that constrains tabular foundation model predictions within economic theory frameworks, ensuring price-demand relationships remain logically consistent while recovering accuracy gains over standard choice models. The approach achieves up to 13 percentage points of accuracy improvement on transportation datasets while guaranteeing economic validity—a problem raw foundation models fail to solve.

AINeutralarXiv – CS AI · May 275/10

🧠

Conceptual Schema Inference for Tabular Datasets using Large Language Models

Researchers propose LLM-based approaches (GeSI and EmSI) to automatically infer conceptual schemas from heterogeneous tabular datasets by analyzing column headers and cell values. The methods address the challenge of organizing large, inconsistent data collections from diverse sources by deriving entity types, attributes, and relationships without manual intervention.

AINeutralarXiv – CS AI · May 116/10

🧠

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

Researchers introduce K-DSM, a kurtosis-based noise scaling method for denoising score matching that improves tabular anomaly detection without additional model complexity. The approach achieves state-of-the-art performance by adaptively setting noise levels per feature based on marginal distribution shape, reducing hyperparameter tuning burden in scenarios where anomalies are unknown.

AINeutralarXiv – CS AI · May 16/10

🧠

CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations

Researchers developed CoAX, a cognitive modeling framework that analyzes how users understand and interpret AI explanations (XAI) when making decisions about tabular data. By studying human reasoning strategies across different explanation methods, the team found that cognitive models better predict human decision-making than traditional machine learning proxies, offering insights to improve the design of more usable AI explanations.

AINeutralarXiv – CS AI · May 16/10

🧠

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

Researchers introduce TopBench, a benchmark dataset of 779 samples designed to evaluate how well Large Language Models handle implicit prediction tasks over tabular data—queries requiring inference from historical patterns rather than simple data retrieval. Testing reveals current LLMs struggle with intent recognition and default to lookup-based approaches, indicating that accurate intent disambiguation is critical before predictive reasoning can succeed.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering

Researchers introduce TagCC, a novel deep clustering framework that combines Large Language Models with contrastive learning to enhance tabular data analysis by incorporating semantic knowledge from feature names and values. The approach bridges the gap between statistical co-occurrence patterns and intrinsic semantic understanding, demonstrating significant performance improvements over existing methods in finance and healthcare applications.

AINeutralarXiv – CS AI · Mar 176/10

🧠

A Closer Look into LLMs for Table Understanding

Researchers conducted an empirical study on 16 Large Language Models to understand how they process tabular data, revealing a three-phase attention pattern and finding that tabular tasks require deeper neural network layers than math reasoning. The study analyzed attention dynamics, layer depth requirements, expert activation in MoE models, and the impact of different input designs on table understanding performance.

AINeutralarXiv – CS AI · Mar 45/103

🧠

Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis

Researchers propose a new framework for handling ambiguity in natural language queries for tabular data analysis, reframing ambiguity as a cooperative feature rather than a deficiency. The study analyzes 15 datasets and finds that current evaluation methods inadequately assess both system accuracy and interpretation capabilities.

AIBullisharXiv – CS AI · Mar 36/105

🧠

GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics

Researchers introduced GateLens, an LLM-based system that uses Relational Algebra as an intermediate layer to analyze complex tabular data more reliably than traditional approaches. The system demonstrated over 80% reduction in analysis time in automotive software analytics while maintaining high accuracy, outperforming existing Chain-of-Thought methods.

AINeutralarXiv – CS AI · Mar 95/10

🧠

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

Researchers introduced TML-Bench, a new benchmark for evaluating AI coding agents on tabular machine learning tasks similar to Kaggle competitions. The study tested 10 open-source language models across four competitions with different time budgets, finding that MiniMax-M2.1 achieved the best overall performance.

AINeutralarXiv – CS AI · Feb 274/107

🧠

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

Researchers developed smooth-basis regression models including anisotropic RBF networks and Chebyshev polynomial regressors that compete with tree ensembles in tabular regression tasks. Testing across 55 datasets showed these models achieve similar accuracy to tree ensembles while offering better generalization properties and gradual prediction surfaces suitable for optimization applications.

AINeutralarXiv – CS AI · Feb 274/103

🧠

TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion

Researchers introduce TabDLM, a new AI framework that generates synthetic tabular data containing both numerical values and free-form text using joint numerical-language diffusion models. The approach addresses limitations of existing diffusion and LLM-based methods by combining masked diffusion for text with continuous diffusion for numbers, enabling better synthetic data generation for privacy and data augmentation applications.

← PrevPage 2 of 2