y0news
#ai-limitations4 articles
4 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago7
๐Ÿง 

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning

Researchers propose a unified theory explaining why AI models trained on human feedback exhibit persistent error floors that cannot be eliminated through scaling alone. The study demonstrates that human supervision acts as an information bottleneck due to annotation noise, subjective preferences, and language limitations, requiring auxiliary non-human signals to overcome these structural limitations.

AIBearisharXiv โ€“ CS AI ยท 4h ago4
๐Ÿง 

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.

AIBearisharXiv โ€“ CS AI ยท 4h ago5
๐Ÿง 

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Researchers introduce FRIEDA, a new benchmark for testing cartographic reasoning in large vision-language models, revealing significant limitations. The best AI models achieve only 37-38% accuracy compared to 84.87% human performance on complex map interpretation tasks requiring multi-step spatial reasoning.

AINeutralarXiv โ€“ CS AI ยท 4h ago0
๐Ÿง 

User Misconceptions of LLM-Based Conversational Programming Assistants

Researchers analyzed user misconceptions about LLM-based programming assistants like ChatGPT, finding users often have misplaced expectations about web access, code execution, and debugging capabilities. The study examined Python programming conversations from WildChat dataset and identified the need for clearer communication of tool capabilities to prevent over-reliance and unproductive practices.