AINeutralarXiv โ CS AI ยท 15h ago6/10
๐ง
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Researchers introduce KramaBench, a comprehensive benchmark testing AI systems' ability to execute end-to-end data processing pipelines on real-world data lakes. The study reveals significant limitations in current AI systems, with the best performing system achieving only 55% accuracy in full data-lake scenarios and leading LLMs implementing just 20% of individual data tasks correctly.