y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#data-lakes News & Analysis

3 articles tagged with #data-lakes. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AINeutralarXiv – CS AI · 4d ago6/10
🧠

LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

Researchers introduced LakeQA, a new benchmark dataset for evaluating large language models on question-answering tasks over massive data lakes containing 9.5TB of heterogeneous data. The benchmark reveals significant challenges in current LLMs, with GPT-5.2 achieving only 18.37% accuracy, highlighting the gap between reading-comprehension performance and real-world search-and-reasoning requirements.

🧠 GPT-5
AINeutralarXiv – CS AI · Mar 96/10
🧠

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

Researchers introduce KramaBench, a comprehensive benchmark testing AI systems' ability to execute end-to-end data processing pipelines on real-world data lakes. The study reveals significant limitations in current AI systems, with the best performing system achieving only 55% accuracy in full data-lake scenarios and leading LLMs implementing just 20% of individual data tasks correctly.