#data-science News & Analysis

35 articles tagged with #data-science. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

35 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection

A research paper challenges the credibility of unsupervised feature selection methods by demonstrating that many state-of-the-art approaches perform no better than random selection. The study calls for establishing random feature selection as a mandatory baseline in future research to ensure genuine methodological improvements.

AIBullisharXiv – CS AI · Jun 117/10

🧠

Human-Guided Agentic AI for Multimodal Clinical Prediction: Lessons from the AgentDS Healthcare Benchmark

Researchers demonstrate that human-guided agentic AI systems outperform fully automated approaches on clinical prediction tasks, achieving strong benchmark results by combining domain expertise with autonomous workflows. The study reveals that human-directed decisions at critical junctures—particularly in multimodal feature engineering from clinical notes, billing documents, and vital signs—yield cumulative performance gains of +0.065 F1 over purely automated baselines.

AINeutralarXiv – CS AI · May 127/10

🧠

Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents

Researchers introduce Ambig-DS, a benchmark suite that evaluates how AI data-science agents handle ambiguous task specifications. The benchmark reveals that current agents silently commit to incorrect interpretations rather than flagging underspecified requirements, a critical failure mode masked by clean-looking outputs that fail to achieve intended objectives.

AIBearisharXiv – CS AI · Apr 147/10

🧠

Sanity Checks for Agentic Data Science

Researchers propose lightweight sanity checks for agentic data science (ADS) systems to detect falsely optimistic conclusions that users struggle to identify. Using the Predictability-Computability-Stability framework, the checks expose whether AI agents like OpenAI Codex reliably distinguish signal from noise. Testing on 11 real datasets reveals that over half produced unsupported affirmative conclusions despite individual runs suggesting otherwise.

🏢 OpenAI

AINeutralarXiv – CS AI · Mar 267/10

🧠

Entire Space Counterfactual Learning for Reliable Content Recommendations

Researchers developed ESCM² (Entire Space Counterfactual Multitask Model), a new framework that improves post-click conversion rate estimation in recommender systems by addressing intrinsic estimation bias and false independence assumptions. The model-agnostic approach incorporates counterfactual learning to enhance recommendation accuracy and has been validated on large-scale industrial datasets.

AIBullisharXiv – CS AI · Mar 167/10

🧠

From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness

Researchers propose a new theoretical framework explaining why modern machine learning models achieve robust performance using high-dimensional, error-prone data, challenging the traditional 'Garbage In, Garbage Out' principle. The study introduces concepts like 'Informative Collinearity' and 'Proactive Data-Centric AI' to show how data architecture and model capacity work together to overcome noise and structural uncertainty.

AINeutralarXiv – CS AI · Mar 56/10

🧠

WebDS: An End-to-End Benchmark for Web-based Data Science

Researchers introduce WebDS, a new benchmark for evaluating AI agents on real-world web-based data science tasks across 870 scenarios and 29 websites. Current state-of-the-art LLM agents achieve only 15% success rates compared to 90% human accuracy, revealing significant gaps in AI capabilities for complex data workflows.

AIBullishHugging Face Blog · Aug 207/107

🧠

NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset

NVIDIA has released a massive 6 million sample multi-lingual reasoning dataset, representing a significant contribution to AI research and development. This dataset release could accelerate advances in AI reasoning capabilities across multiple languages and benefit the broader AI research community.

AINeutralarXiv – CS AI · Jun 236/10

🧠

StatABench: Dataset and Framework for Evaluating Statistical Analysis Capabilities of LLMs

Researchers introduced StatABench, a comprehensive benchmark for evaluating LLMs' statistical analysis capabilities across 434 questions and tasks. Evaluations reveal significant performance gaps, with GPT-5.1 achieving only 68.6% accuracy on closed-ended questions and top agent frameworks scoring 61.86% on complex modeling tasks, exposing persistent weaknesses in tool-grounded reasoning and methodological decision-making.

🧠 GPT-5

AINeutralarXiv – CS AI · Jun 235/10

🧠

Cohort Organized Learning: Clustering Through Agreement

Researchers introduce Cohort Organized Learning (CoOL), a neural network-based clustering method that eliminates the need for explicit distance or similarity calculations. The approach uses expectation maximization to train networks capable of clustering diverse data types including vectors and images, offering a flexible alternative to traditional clustering algorithms.

GeneralNeutralarXiv – CS AI · Jun 195/10

📰

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

Researchers have developed a machine learning framework called the Global Ease of Living Index that combines socio-economic and infrastructure indicators to measure quality of life across major economies since 1970. Using dimensionality reduction techniques and algorithms to handle missing data, the index provides policymakers with a transparent tool to identify areas requiring intervention such as healthcare, employment, and public safety.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Computational Identifiability

Researchers propose 'computational identifiability,' a new framework that redefines how causal effects are identified in data science by shifting from theoretical, infinite-data assumptions to practical, finite computational search procedures. This approach enables identification under realistic conditions including small samples, ambiguous graphical criteria, and mixed observational-interventional data.

AINeutralMIT News – AI · Jun 115/10

🧠

When it comes to predicting people’s preferences, it pays to consider “the power of three”

MIT researchers have advanced random utility models, a framework nearly a century old for predicting consumer preferences, by introducing what they call 'the power of three.' This upgrade enhances the accuracy and applicability of preference prediction across various domains, potentially impacting how businesses model consumer behavior and decision-making.

GeneralNeutralMIT Technology Review · Jun 115/10

📰

Inside soccer’s data renaissance

Soccer is experiencing a data analytics renaissance where advanced metrics and AI-driven insights are fundamentally changing how teams strategize and play. The article explores how data science is transforming tactical decision-making, exemplified by unconventional plays that confuse casual observers but make perfect sense to data-informed coaches.

AINeutralarXiv – CS AI · Jun 96/10

🧠

DN-Hypo-Pipeline: An AI-Driven Workflow for Hypothesis Generation via Large Language Models and Scientific Explanations

Researchers introduce DN-Hypo-Pipeline, an AI workflow leveraging large language models to automate scientific hypothesis generation from existing research literature. The system reconstructs novel explanations for observed phenomena and was validated in data science modeling, with two generated hypotheses producing algorithms that outperformed baseline models from the original papers.

AINeutralarXiv – CS AI · Jun 95/10

🧠

Reconstructing Synthetic SDO/AIA 193 A EUV Images from He I 10830 A Observations with Diffusion Model Translator

Researchers developed a diffusion model-based framework called CH-aware DMT that reconstructs synthetic SDO/AIA 193 Å EUV solar images from historical He I 10830 Å observations, enabling coronal analysis extending back decades before modern EUV imaging became available. The model achieves high fidelity on test data (CC=0.92 for full-disk morphology) and demonstrates physical plausibility when validated against SOHO, Yohkoh, and long-term solar activity proxies spanning 1974-2015.

AINeutralarXiv – CS AI · Jun 26/10

🧠

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models

Researchers propose a novel upper bound method to assess how selection bias in training data impacts machine learning model performance when deployed to broader populations, addressing a critical gap in healthcare AI safety. The approach works with realistic constraints where the selection mechanism and target population are only partially observable, validated through synthetic and real-world medical datasets.

AINeutralarXiv – CS AI · May 296/10

🧠

Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases

Researchers introduce Rel-MOSS, a novel graph neural network approach designed to address class imbalance problems in relational database entity classification. The method uses relation-centric gating and minority oversampling techniques to prevent underrepresentation of minority classes, achieving 2-4% performance improvements over existing relational deep learning methods.

AIBullishOpenAI News · May 156/10

🧠

How data science teams use Codex

Codex enables data science teams to automate the generation of business intelligence documents including root-cause analyses, impact reports, KPI summaries, and dashboard specifications directly from raw work data. This capability streamlines the documentation and reporting workflow for data professionals, reducing manual effort in translating analytical findings into structured business outputs.

AINeutralarXiv – CS AI · May 126/10

🧠

Learning Unified Representations of Normalcy for Time Series Anomaly Detection

Researchers present U²AD, a novel unsupervised anomaly detection framework for multivariate time series that uses score-based generative modeling to learn robust representations of normal data distributions. The method demonstrates superior performance in detecting anomalies earlier than existing approaches, addressing a critical challenge in time series analysis where anomalous patterns must be identified without prior examples.

AINeutralarXiv – CS AI · Mar 176/10

🧠

Estimating Causal Effects of Text Interventions Leveraging LLMs

Researchers propose CausalDANN, a novel method using large language models to estimate causal effects of textual interventions in social systems. The approach addresses limitations of traditional causal inference methods when dealing with complex, high-dimensional textual data and can handle arbitrary text interventions even with observational data only.

AIBullisharXiv – CS AI · Mar 116/10

🧠

An AI-powered Bayesian Generative Modeling Approach for Arbitrary Conditional Inference

Researchers have developed Bayesian Generative Modeling (BGM), a new AI framework that enables flexible conditional inference on any partition of observed variables without retraining. The approach uses stochastic iterative Bayesian updating with theoretical guarantees for convergence and statistical consistency, offering a universal engine for conditional prediction with uncertainty quantification.

AIBullisharXiv – CS AI · Mar 36/104

🧠

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

Researchers introduce AIssistant, an open-source framework that combines human expertise with AI agents to streamline scientific review and perspective paper creation in data science. The system uses 15 specialized LLM-driven agents across two workflows and demonstrates 65.7% time savings while maintaining research quality through strategic human oversight.

AINeutralarXiv – CS AI · Mar 26/1013

🧠

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

Researchers introduce DARE-bench, a new benchmark with 6,300 Kaggle-derived tasks for evaluating Large Language Models' performance on data science and machine learning tasks. The benchmark reveals that even advanced models like GPT-4-mini struggle with ML modeling tasks, while fine-tuning on DARE-bench data can improve model accuracy by up to 8x.

AIBullishOpenAI News · Jan 296/107

🧠

Inside OpenAI’s in-house data agent

OpenAI has developed an internal AI data agent that leverages GPT-5, Codex, and memory capabilities to analyze large datasets and provide reliable insights within minutes. This represents a significant advancement in AI-powered data analysis tools for enterprise applications.

Page 1 of 2Next →