y0news
#gpt2 articles
2 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago10
๐Ÿง 

Beyond Na\"ive Prompting: Strategies for Improved Context-aided Forecasting with LLMs

Researchers introduce a framework of four strategies to improve large language models' performance in context-aided forecasting, addressing diagnostic tools, accuracy, and efficiency. The study reveals an 'Execution Gap' where models understand context but fail to apply reasoning, while showing 25-50% performance improvements and cost-effective adaptive routing approaches.

AIBearisharXiv โ€“ CS AI ยท 6h ago6
๐Ÿง 

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Researchers introduce FRIEDA, a new benchmark for testing cartographic reasoning in large vision-language models, revealing significant limitations. The best AI models achieve only 37-38% accuracy compared to 84.87% human performance on complex map interpretation tasks requiring multi-step spatial reasoning.