AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce PIQL, a framework that leverages privileged information to accelerate training and improve generalization in tabular foundation models. By incorporating dataset-level statistics and encodings of data-generating processes during training, the approach reduces computational requirements and convergence time while maintaining inference efficiency through reconstruction mechanisms.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers propose a novel uncertainty quantification method for Prior-Data Fitted Networks (PFNs), emerging foundation models for tabular data prediction, using martingale posteriors to provide calibrated confidence estimates. The technique is tuning-free, computationally efficient, and mathematically proven to converge, addressing a significant limitation in PFNs' practical applicability.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers introduce Schema-1, the first Data Language Model (DLM) designed to natively understand tabular data without preprocessing, similar to how language models understand text. The 140M-parameter model trained on 2.3M datasets outperforms gradient-boosted trees, AutoML systems, and existing tabular foundation models on prediction benchmarks and demonstrates superior performance on missing value imputation and dataset classification tasks.
AIBullisharXiv – CS AI · Apr 157/10
🧠Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.
AINeutralarXiv – CS AI · Apr 107/10
🧠OmniTabBench introduces the largest tabular data benchmark with 3,030 datasets to evaluate gradient boosted decision trees, neural networks, and foundation models. The comprehensive analysis reveals no universally superior approach, but identifies specific conditions favoring different model categories through decoupled metafeature analysis.
AIBullisharXiv – CS AI · Mar 117/10
🧠Researchers propose PRPO (Permutation Relative Policy Optimization), a reinforcement learning framework that enhances large language models' numerical reasoning capabilities for tabular data prediction. The method achieves performance comparable to supervised baselines while excelling in zero-shot scenarios, with an 8B parameter model outperforming much larger models by up to 53.17%.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduce SPRINT, the first Few-Shot Class-Incremental Learning (FSCIL) framework designed specifically for tabular data domains like cybersecurity and healthcare. The system achieves 77.37% accuracy in 5-shot learning scenarios, outperforming existing methods by 4.45% through novel semi-supervised techniques that leverage unlabeled data and confidence-based pseudo-labeling.
AIBullisharXiv – CS AI · Mar 46/103
🧠Researchers introduce MedFeat, a new AI framework that uses Large Language Models for healthcare feature engineering in clinical tabular predictions. The system incorporates model awareness and domain knowledge to discover clinically meaningful features that outperform traditional approaches and demonstrate robustness across different hospital settings.
AINeutralarXiv – CS AI · 18h ago5/10
🧠TabChange is a new machine learning approach for modifying individual attributes in tabular datasets while maintaining data naturalness and minimizing unintended changes. The method analyzes attribute relationships and uses adversarial techniques to remove latent information about target attributes, producing more valid counterfactuals than existing generative models.
AINeutralarXiv – CS AI · 18h ago6/10
🧠Researchers introduce ODTQA-FoRe, a new dataset and TimeFore framework enabling large language models to perform future-oriented numerical predictions on tabular data using time-series forecasting. The innovation addresses a critical gap where existing LLM systems excel at historical analysis but struggle with predictive reasoning, demonstrated through real estate data scenarios.
AINeutralarXiv – CS AI · 18h ago5/10
🧠Researchers introduce DhondtXAI, a novel explainable AI framework for tabular data that uses proportional representation principles (the D'Hondt rule) to attribute feature importance instead of relying on SHAP values. The method demonstrates high correlation with SHAP while offering complementary capabilities for handling feature interactions and alliances, validated across synthetic tests and healthcare datasets.
AINeutralarXiv – CS AI · 18h ago6/10
🧠Researchers introduce Learning-to-Measure (L2M), a meta-learning framework that enables AI systems to learn optimal feature acquisition strategies across multiple tasks without task-specific retraining. The approach combines uncertainty quantification with a greedy acquisition agent, demonstrating superior performance on tabular datasets with missing features and limited labels.
AINeutralarXiv – CS AI · 18h ago6/10
🧠Researchers propose DiffusionRank, a generative deep learning approach to learning-to-rank in information retrieval that uses denoising diffusion models instead of traditional discriminative methods. By modeling the full joint distribution of features and relevance labels, the method demonstrates improvements over classical ranking approaches on standard benchmarks.
AINeutralarXiv – CS AI · 18h ago6/10
🧠Researchers identify critical failure modes in semi-supervised learning (SSL) applied to tabular data with fairness constraints, where fairness regularizers can paradoxically erode model performance. They propose Online Primal-Dual Allocation (OPDA), an adaptive controller that dynamically balances fairness and stability penalties without manual tuning, demonstrating improved robustness across benchmark datasets like Adult, COMPAS, and ACSIncome.
🏢 Meta
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose MaskDiff-AD, a novel anomaly detection method using masked diffusion models that operates on categorical and discrete data without requiring reverse-time sampling. The approach demonstrates competitive or superior performance compared to existing anomaly detection baselines across tabular and text datasets.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce BIRDNet, a neurosymbolic deep learning architecture that mines Boolean implication relationships from tabular data and encodes them as sparse, interpretable neural networks. The model achieves near-baseline performance on biomedical datasets while using 96× fewer active parameters and maintaining human-readable symbolic rules without external rule bases.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers propose a two-stage adapter that constrains tabular foundation model predictions within economic theory frameworks, ensuring price-demand relationships remain logically consistent while recovering accuracy gains over standard choice models. The approach achieves up to 13 percentage points of accuracy improvement on transportation datasets while guaranteeing economic validity—a problem raw foundation models fail to solve.
AINeutralarXiv – CS AI · 6d ago5/10
🧠Researchers propose LLM-based approaches (GeSI and EmSI) to automatically infer conceptual schemas from heterogeneous tabular datasets by analyzing column headers and cell values. The methods address the challenge of organizing large, inconsistent data collections from diverse sources by deriving entity types, attributes, and relationships without manual intervention.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce K-DSM, a kurtosis-based noise scaling method for denoising score matching that improves tabular anomaly detection without additional model complexity. The approach achieves state-of-the-art performance by adaptively setting noise levels per feature based on marginal distribution shape, reducing hyperparameter tuning burden in scenarios where anomalies are unknown.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers developed CoAX, a cognitive modeling framework that analyzes how users understand and interpret AI explanations (XAI) when making decisions about tabular data. By studying human reasoning strategies across different explanation methods, the team found that cognitive models better predict human decision-making than traditional machine learning proxies, offering insights to improve the design of more usable AI explanations.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce TopBench, a benchmark dataset of 779 samples designed to evaluate how well Large Language Models handle implicit prediction tasks over tabular data—queries requiring inference from historical patterns rather than simple data retrieval. Testing reveals current LLMs struggle with intent recognition and default to lookup-based approaches, indicating that accurate intent disambiguation is critical before predictive reasoning can succeed.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce TagCC, a novel deep clustering framework that combines Large Language Models with contrastive learning to enhance tabular data analysis by incorporating semantic knowledge from feature names and values. The approach bridges the gap between statistical co-occurrence patterns and intrinsic semantic understanding, demonstrating significant performance improvements over existing methods in finance and healthcare applications.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers conducted an empirical study on 16 Large Language Models to understand how they process tabular data, revealing a three-phase attention pattern and finding that tabular tasks require deeper neural network layers than math reasoning. The study analyzed attention dynamics, layer depth requirements, expert activation in MoE models, and the impact of different input designs on table understanding performance.
AINeutralarXiv – CS AI · Mar 45/103
🧠Researchers propose a new framework for handling ambiguity in natural language queries for tabular data analysis, reframing ambiguity as a cooperative feature rather than a deficiency. The study analyzes 15 datasets and finds that current evaluation methods inadequately assess both system accuracy and interpretation capabilities.
AIBullisharXiv – CS AI · Mar 36/105
🧠Researchers introduced GateLens, an LLM-based system that uses Relational Algebra as an intermediate layer to analyze complex tabular data more reliably than traditional approaches. The system demonstrated over 80% reduction in analysis time in automotive software analytics while maintaining high accuracy, outperforming existing Chain-of-Thought methods.