y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#large-language-models News & Analysis

Over the past month, coverage of #large-language-models has grown significantly, with 100 articles published in the last 30 days out of 273 total indexed pieces. The discussion landscape shows predominantly neutral sentiment at 59%, though bullish perspectives account for 37% of coverage. Notably, sentiment has softened compared to the prior quarter, declining 14.2 percentage points in bullish tone. ArXiv's computer science and AI section dominates source coverage, with Llama, Gemini, and GPT-4 emerging as the most frequently discussed models. Scan the articles below for recent developments and perspectives on the topic.

sentiment · last 30d (100 articles) · -14.2pp bullish vs prior 90d
Top sources:arXiv – CS AI · 254Crypto Briefing · 2TechCrunch – AI · 2IEEE Spectrum – AI · 1Decrypt · 1
Most-discussed entities:Llama · 7Gemini · 6GPT-4 · 6Claude · 4Anthropic · 4
416 articles
AIBullisharXiv – CS AI · Apr 206/10
🧠

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

Researchers introduce DiZiNER, a framework that improves zero-shot named entity recognition by simulating human annotation disagreement processes using multiple LLMs. The approach achieves state-of-the-art results on 14 of 18 benchmarks, closing the performance gap between zero-shot and supervised systems by over 11 percentage points.

🧠 GPT-5
AIBullisharXiv – CS AI · Apr 206/10
🧠

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

Researchers introduce JumpLoRA, a novel framework that uses sparse adapters with JumpReLU gating to enable continual learning in large language models while mitigating catastrophic forgetting. The method dynamically isolates parameters across tasks, outperforming existing state-of-the-art approaches like ELLA and significantly improving IncLoRA performance.

AIBullisharXiv – CS AI · Apr 206/10
🧠

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

Researchers introduce MM-Telco, a comprehensive multimodal benchmark and model suite designed to adapt large language models for telecommunications applications. The framework addresses domain-specific challenges in network optimization, troubleshooting, and customer support, with fine-tuned models demonstrating significant performance improvements over baseline LLMs.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Beyond Factual Grounding: The Case for Opinion-Aware Retrieval-Augmented Generation

Researchers propose Opinion-Aware Retrieval-Augmented Generation (RAG) to address a critical bias in current LLM systems that treat subjective content as noise rather than valuable information. By formalizing the distinction between factual queries (epistemic uncertainty) and opinion queries (aleatoric uncertainty), the team develops an architecture that preserves diverse perspectives in knowledge retrieval, demonstrating 26.8% improved sentiment diversity and 42.7% better entity matching on real-world e-commerce data.

AINeutralarXiv – CS AI · Apr 156/10
🧠

A Scoping Review of Large Language Model-Based Pedagogical Agents

A comprehensive scoping review of 52 studies examines Large Language Model-based pedagogical agents across educational contexts from November 2022 to January 2025. The research identifies four key design dimensions (interaction approach, domain scope, role complexity, system integration) and emerging trends including multi-agent systems, virtual student simulation, and integration with immersive technologies, while flagging critical research gaps around privacy, accuracy, and student autonomy.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Modeling Co-Pilots for Text-to-Model Translation

Researchers introduce Text2Model and Text2Zinc, frameworks that use large language models to translate natural language descriptions into formal optimization and satisfaction models. The work represents the first unified approach combining both problem types with a solver-agnostic architecture, though experiments reveal LLMs remain imperfect at this task despite showing competitive performance.

AINeutralarXiv – CS AI · Apr 156/10
🧠

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

Researchers propose GRACE, a dynamic coreset selection framework that reduces LLM training costs by intelligently selecting representative dataset subsets. The method combines representation diversity with gradient-based metrics and uses k-NN graph propagation to adapt to evolving training dynamics, demonstrating improved efficiency across multiple benchmarks.

AIBullisharXiv – CS AI · Apr 156/10
🧠

M$^\star$: Every Task Deserves Its Own Memory Harness

Researchers introduce M★, a method that automatically evolves task-specific memory systems for large language model agents by treating memory architecture as executable Python code. The approach outperforms fixed memory designs across conversation, planning, and reasoning benchmarks, suggesting that specialized memory mechanisms significantly outperform one-size-fits-all solutions.

AIBullisharXiv – CS AI · Apr 156/10
🧠

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

TimeSAF introduces a hierarchical asynchronous fusion framework that improves how large language models guide time series forecasting by decoupling semantic understanding from numerical dynamics. This addresses a fundamental architectural limitation in existing methods and demonstrates superior performance on standard benchmarks with strong generalization capabilities.

AINeutralarXiv – CS AI · Apr 156/10
🧠

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Researchers propose CoDe-R, a two-stage framework using Large Language Models to improve binary decompilation by reducing logical errors and semantic misalignment. A 1.3B model using this approach achieves state-of-the-art performance on the HumanEval-Decompile benchmark, becoming the first lightweight model to exceed 50% re-executability rates.

AINeutralarXiv – CS AI · Apr 156/10
🧠

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Researchers investigate on-policy distillation (OPD) dynamics in large language model training, identifying two critical success conditions: compatible thinking patterns between student and teacher models, and genuine new capabilities from the teacher. The study reveals that successful OPD relies on token-level alignment and proposes recovery strategies for failing distillation scenarios.

AIBullisharXiv – CS AI · Apr 156/10
🧠

Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

Researchers propose Joint Flashback Adaptation, a novel method to address catastrophic forgetting in large language models during incremental task learning. The approach uses limited prompts from previous tasks combined with latent task interpolation, demonstrating improved performance across 1000+ instruction-following and reasoning tasks without requiring full replay data.

AINeutralarXiv – CS AI · Apr 146/10
🧠

OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

Researchers introduce Object-Oriented World Modeling (OOWM), a framework that structures LLM reasoning for robotic planning by replacing linear text with explicit symbolic representations using UML diagrams and object hierarchies. The approach combines supervised fine-tuning with group relative policy optimization to achieve superior planning performance on embodied tasks, demonstrating that formal software engineering principles can enhance AI reasoning capabilities.

AINeutralarXiv – CS AI · Apr 146/10
🧠

LLMs for Text-Based Exploration and Navigation Under Partial Observability

Researchers evaluated whether large language models can function as text-only controllers for navigation and exploration in unknown environments under partial observability. Testing nine contemporary LLMs on ASCII gridworld tasks, they found reasoning-tuned models reliably complete navigation goals but remain inefficient compared to optimal paths, with few-shot prompting reducing invalid moves and improving path efficiency.

AINeutralarXiv – CS AI · Apr 146/10
🧠

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

Researchers introduce TimeSeriesExamAgent, a scalable framework for automatically generating time series reasoning benchmarks using LLM agents and templates. The study reveals that while large language models show promise in time series tasks, they significantly underperform in abstract reasoning and domain-specific applications across healthcare, finance, and weather domains.

AIBullisharXiv – CS AI · Apr 146/10
🧠

CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Researchers introduce CARO, a two-stage training framework that enhances large language models' ability to perform robust content moderation through analogical reasoning. By combining retrieval-augmented generation with direct preference optimization, CARO achieves 24.9% F1 score improvement over state-of-the-art models including DeepSeek R1 and LLaMA Guard on ambiguous moderation cases.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

Researchers introduce Agent Mentor, an open-source analytics pipeline that monitors and automatically improves AI agent behavior by analyzing execution logs and iteratively refining system prompts with corrective instructions. The framework addresses variability in large language model-based agent performance caused by ambiguous prompt formulations, demonstrating consistent accuracy improvements across multiple configurations.

AINeutralarXiv – CS AI · Apr 146/10
🧠

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

Researchers introduce SciPredict, a benchmark testing whether large language models can predict scientific experiment outcomes across physics, biology, and chemistry. The study reveals that while some frontier models marginally exceed human experts (~20% accuracy), they fundamentally fail to assess prediction reliability, suggesting superhuman performance in experimental science requires not just better predictions but better calibration awareness.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering

Researchers introduce TagCC, a novel deep clustering framework that combines Large Language Models with contrastive learning to enhance tabular data analysis by incorporating semantic knowledge from feature names and values. The approach bridges the gap between statistical co-occurrence patterns and intrinsic semantic understanding, demonstrating significant performance improvements over existing methods in finance and healthcare applications.

AINeutralarXiv – CS AI · Apr 146/10
🧠

CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

Researchers introduce CFMS, a two-stage framework combining multimodal large language models with symbolic reasoning to improve tabular data comprehension for question answering and fact verification tasks. The approach achieves competitive results on WikiTQ and TabFact benchmarks while demonstrating particular robustness with large tables and smaller model architectures.

AINeutralarXiv – CS AI · Apr 146/10
🧠

Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model

A study evaluating the consistency of exercise prescriptions generated by Gemini 2.5 Flash found high semantic consistency but significant variability in quantitative components like exercise intensity. The research highlights that while LLMs produce semantically similar outputs, structural constraints and expert validation are necessary before clinical deployment.

🧠 Gemini
AINeutralarXiv – CS AI · Apr 146/10
🧠

SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models

SRBench introduces a comprehensive evaluation framework for Sequential Recommendation models that combines Large Language Models with traditional neural network approaches. The benchmark addresses critical gaps in existing evaluation methodologies by incorporating fairness, stability, and efficiency metrics alongside accuracy, while establishing fair comparison mechanisms between LLM-based and neural network-based recommendation systems.

🏢 Meta
← PrevPage 11 of 17Next →