y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llms News & Analysis

19 articles tagged with #llms. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

19 articles
AIBullisharXiv – CS AI · Jun 27/10
🧠

MindZero: Learning Online Mental Reasoning With Zero Annotations

MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models to perform robust Theory of Mind reasoning without requiring annotated mental state data. The approach combines model-based planning with neural scaling, achieving superior accuracy and efficiency compared to traditional model-based methods and LLMs alone.

AIBullisharXiv – CS AI · May 127/10
🧠

LLM Jaggedness Unlocks Scientific Creativity

Researchers introduce SciAidanBench, a benchmark revealing that LLM capability improvements are uneven across tasks and domains—a phenomenon termed 'jaggedness.' By evaluating 19 models across 8 providers, they demonstrate that stronger models don't uniformly excel at scientific creativity, but this fragmentation can be leveraged through ensemble methods to achieve superior performance.

AIBullisharXiv – CS AI · May 77/10
🧠

The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards

Researchers develop a theoretical framework explaining how reinforcement learning with verifiable rewards (RLVR) enables long-horizon reasoning in large language models through an implicit curriculum effect. The analysis reveals that mixed-difficulty training naturally progresses from easy to hard problems without explicit scheduling, with learning dynamics determined by the smoothness of the difficulty spectrum.

AIBearisharXiv – CS AI · May 47/10
🧠

Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

Researchers have identified critical vulnerabilities in how large language models make strategic decisions under incomplete information, revealing gaps between their internal beliefs and external reasoning. The study demonstrates that LLMs encode more accurate hidden beliefs than they express verbally, but these beliefs are brittle and degrade with multi-hop reasoning, raising significant concerns about deploying LLMs in high-stakes decision-making scenarios without safeguards.

🧠 Llama
AINeutralarXiv – CS AI · Apr 147/10
🧠

A Mathematical Explanation of Transformers

Researchers propose a novel mathematical framework interpreting Transformers as discretized integro-differential equations, revealing self-attention as a non-local integral operator and layer normalization as time-dependent projection. This theoretical foundation bridges deep learning architectures with continuous mathematical modeling, offering new insights for architecture design and interpretability.

AINeutralarXiv – CS AI · Apr 137/10
🧠

Medical Reasoning with Large Language Models: A Survey and MR-Bench

Researchers present a comprehensive survey of medical reasoning in large language models, introducing MR-Bench, a clinical benchmark derived from real hospital data. The study reveals a significant performance gap between exam-style tasks and authentic clinical decision-making, highlighting that robust medical reasoning requires more than factual recall in safety-critical healthcare applications.

AINeutralarXiv – CS AI · Jun 16/10
🧠

AMix-2: Establishing Protein as a Native Modality in Large Language Models

Researchers introduce AMix-2, a protein-text foundation model that treats protein sequences as a native modality in large language models alongside natural language. The model uses a novel block-wise diffusion approach instead of traditional left-to-right generation, paired with a new ProteinArena benchmark for evaluating protein AI systems.

AINeutralarXiv – CS AI · Jun 16/10
🧠

DTBench: A Synthetic Benchmark for Document-to-Table Extraction

Researchers introduce DTBench, a synthetic benchmark for evaluating large language models on document-to-table extraction tasks. Using a reverse Table2Doc synthesis approach with multi-agent workflows, the benchmark covers 13 subcategories across 5 major capability areas, revealing significant performance gaps and persistent challenges in reasoning and conflict resolution across mainstream LLMs.

AINeutralarXiv – CS AI · May 296/10
🧠

Reliable Reasoning with Large Language Models via Preference-Based Maximum Satisfiability

Researchers propose a hybrid reasoning system that combines Large Language Models with preference-based Maximum Satisfiability solvers to tackle complex optimization problems with multiple constraints. The approach achieves over 80% correctness rates on preference-based reasoning tasks, substantially outperforming traditional LLM baselines that rarely produce feasible solutions.

AINeutralarXiv – CS AI · May 286/10
🧠

Snippet-Driven Supply Chain Discovery with LLMs: Scaling Visibility in China

Researchers propose a snippet-driven method using large language models to construct supply chain knowledge graphs for Chinese firms, achieving 7.2× greater coverage than traditional disclosure databases while reducing computational costs by 251× compared to full-text processing.

AINeutralarXiv – CS AI · May 286/10
🧠

Emergent Analogical Reasoning in Transformers

Researchers demonstrate that Transformers develop analogical reasoning—the ability to transfer relational patterns across different domains—through two key mechanisms: geometric alignment of structures in embedding space and functor application. This mechanistic understanding bridges cognitive science and neural network architecture, with findings validated across both synthetic tasks and pretrained large language models.

AINeutralarXiv – CS AI · May 286/10
🧠

The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

Researchers provide the first rigorous theoretical analysis of temperature scaling, a widely-used technique for controlling uncertainty in machine learning models. The study reveals that while temperature scaling reliably increases entropy in classifiers, it does not necessarily increase diversity in large language models as commonly claimed, and establishes temperature scaling as the unique linear calibration method that preserves hard predictions.

AINeutralarXiv – CS AI · May 276/10
🧠

DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding

Researchers introduce DynFrame, an advanced video understanding framework that enables multimodal language models to dynamically select both temporal windows and frame sampling rates during inference. The approach achieves competitive performance with smaller 4B models against larger 7B-8B baselines and sets new state-of-the-art results with its 8B variant across six video understanding benchmarks.

AINeutralarXiv – CS AI · May 126/10
🧠

Large Language Models for Sequential Decision-Making: Improving In-Context Learning via Supervised Fine-Tuning

Researchers demonstrate that large language models can be effectively fine-tuned to perform sequential decision-making tasks across MDPs, POMDPs, and ambiguous environments by learning from offline trajectory data. The approach achieves stronger performance than baseline methods, particularly in complex, partially-observed scenarios, with theoretical analysis showing the fine-tuned attention mechanisms implicitly estimate optimal Q-functions.

AIBearisharXiv – CS AI · May 126/10
🧠

Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs

Researchers tested how well Large Language Models handle multi-turn conversations with topic shifts, finding that most LLMs struggle to detect when users pivot to new topics and incorrectly carry over irrelevant context from previous exchanges. The study reveals that only advanced reasoning models and strongly instructed LLMs perform accurately, while open-weight models frequently fail even with explicit cues, highlighting a critical robustness gap in production LLM deployments.

AIBullisharXiv – CS AI · May 16/10
🧠

LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning

Researchers present LLM+ASP, a framework combining large language models with Answer Set Programming to enable nonmonotonic reasoning without task-specific engineering. The system uses automated self-correction loops where an ASP solver provides structured feedback, demonstrating significant performance improvements over monotonic logic approaches across diverse reasoning benchmarks.

AINeutralarXiv – CS AI · Apr 206/10
🧠

The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE

A research paper proposes that AI-driven software engineering doesn't threaten the field but rather expands its scope to include 'semi-executable' artifacts—combinations of natural language, tools, and workflows requiring human or probabilistic interpretation. The Semi-Executable Stack model provides a diagnostic framework across six layers to understand how software engineering practices evolve as AI agents handle routine tasks.

AIBullisharXiv – CS AI · Apr 106/10
🧠

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

Researchers introduce MAT-Cell, a neuro-symbolic AI framework that combines large language models with biological constraints to improve single-cell annotation accuracy. The system uses multi-agent reasoning and verification processes to overcome limitations in both supervised learning and LLM-based approaches, demonstrating superior performance on cross-species benchmarks.

AIBullishGoogle Research Blog · Jul 246/107
🧠

Synthetic and federated: Privacy-preserving domain adaptation with LLMs for mobile applications

The article discusses privacy-preserving domain adaptation techniques using Large Language Models for mobile applications, combining synthetic data generation with federated learning approaches. This represents an advancement in AI privacy technology that could enable better model performance while protecting user data in mobile environments.