9 articles tagged with #temporal-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง Researchers introduce dual-trace memory encoding for LLM agents, pairing factual records with narrative scene reconstructions to improve cross-session recall by 20+ percentage points. The method significantly enhances temporal reasoning and multi-session knowledge aggregation without increasing computational costs, advancing the capability of persistent AI agent systems.
AIBullisharXiv โ CS AI ยท 3d ago7/10
๐ง Researchers introduce Audio Flamingo Next (AF-Next), an advanced open-source audio-language model that processes speech, sound, and music with support for inputs up to 30 minutes. The model incorporates a new temporal reasoning approach and demonstrates competitive or superior performance compared to larger proprietary alternatives across 20 benchmarks.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.
๐ง GPT-4
AIBullisharXiv โ CS AI ยท Mar 46/102
๐ง Researchers introduce CoWVLA (Chain-of-World VLA), a new Vision-Language-Action model paradigm that combines world-model temporal reasoning with latent motion representation for embodied AI. The approach outperforms existing methods in robotic simulation benchmarks while maintaining computational efficiency through a unified autoregressive decoder that models both keyframes and action sequences.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce TimeSeriesExamAgent, a scalable framework for automatically generating time series reasoning benchmarks using LLM agents and templates. The study reveals that while large language models show promise in time series tasks, they significantly underperform in abstract reasoning and domain-specific applications across healthcare, finance, and weather domains.
AINeutralarXiv โ CS AI ยท Apr 76/10
๐ง Researchers have developed LiveFact, a new dynamic benchmark for evaluating Large Language Models' ability to detect fake news and misinformation in real-time conditions. The benchmark addresses limitations of static testing by using temporal evidence sets and finds that open-source models like Qwen3-235B-A22B now match proprietary systems in performance.
AINeutralarXiv โ CS AI ยท Mar 176/10
๐ง Research reveals that Large Language Models struggle with dynamic Theory of Mind tasks, particularly tracking how others' beliefs change over time. While LLMs can infer current beliefs effectively, they fail to maintain and retrieve prior belief states after updates occur, showing patterns consistent with human cognitive biases.
AIBearisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce HEARTS, a comprehensive benchmark for evaluating large language models' ability to reason over health time series data across 16 datasets and 12 health domains. The study reveals that current LLMs significantly underperform compared to specialized models and struggle with multi-step temporal reasoning in healthcare applications.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers have developed Egocentric Co-Pilot, a web-native AI framework that runs on smart glasses and uses Large Language Models to provide assistive AI without requiring screens or free hands. The system combines perception, reasoning, and web tools to support accessibility for people with vision impairments or cognitive overload, showing superior performance compared to commercial baselines.