y0news
AnalyticsDigestsSourcesRSSAICrypto
#medical-data1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 10h ago6/10
๐Ÿง 

HEARTS: Benchmarking LLM Reasoning on Health Time Series

Researchers introduce HEARTS, a comprehensive benchmark for evaluating large language models' ability to reason over health time series data across 16 datasets and 12 health domains. The study reveals that current LLMs significantly underperform compared to specialized models and struggle with multi-step temporal reasoning in healthcare applications.