#large-language-models News & Analysis

Over the past month, coverage of #large-language-models has grown significantly, with 100 articles published in the last 30 days out of 273 total indexed pieces. The discussion landscape shows predominantly neutral sentiment at 59%, though bullish perspectives account for 37% of coverage. Notably, sentiment has softened compared to the prior quarter, declining 14.2 percentage points in bullish tone. ArXiv's computer science and AI section dominates source coverage, with Llama, Gemini, and GPT-4 emerging as the most frequently discussed models. Scan the articles below for recent developments and perspectives on the topic.

sentiment · last 30d (100 articles) · -14.2pp bullish vs prior 90d

Top sources:arXiv – CS AI · 254Crypto Briefing · 2TechCrunch – AI · 2IEEE Spectrum – AI · 1Decrypt · 1

Often co-tagged with:#machine-learning #ai-research #reinforcement-learning #research #artificial-intelligence #multimodal-ai

Most-discussed entities:Llama · 7Gemini · 6GPT-4 · 6Claude · 4Anthropic · 4

580 articles

AINeutralarXiv – CS AI · May 126/10

🧠

Budget-Efficient Automatic Algorithm Design via Code Graph

Researchers propose a budget-efficient automatic algorithm design framework using large language models that operates on code graphs rather than full algorithms. The approach uses LLMs to generate compact corrections—code modifications that add, replace, or remove blocks—which compose into new algorithms, reducing computational waste and improving fitness outcomes on combinatorial optimization problems.

AIBullisharXiv – CS AI · May 126/10

🧠

Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

Researchers introduce Evolving-RL, a framework that optimizes how AI agents learn from past experiences to adapt to new tasks. The method jointly improves both experience extraction and utilization through reinforcement learning, achieving significant performance gains on out-of-distribution tasks without requiring test-time experience accumulation.

AINeutralarXiv – CS AI · May 116/10

🧠

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

Researchers introduce HMACE, a multi-agent AI framework that uses specialized language model agents to design heuristics for combinatorial optimization problems. The system achieves competitive results on benchmark problems while using significantly fewer computational tokens than existing methods, demonstrating improved efficiency in automated algorithm design.

AINeutralarXiv – CS AI · May 116/10

🧠

OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning

Researchers introduce OmicsLM, a multimodal large language model that interprets transcriptomic data by combining quantitative gene expression profiles with natural language processing. Trained on 5.5 million examples across 70 task types, the model outperforms specialized omics tools and general LLMs on language-guided biological reasoning tasks, advancing AI applications in genomic research.

AINeutralarXiv – CS AI · May 116/10

🧠

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment

Researchers propose Shadow Mask Distillation to address the memory bottleneck created by KV cache compression during reinforcement learning post-training of large language models. The technique tackles the critical off-policy bias that emerges when compressed contexts are used during rollout generation while full contexts are used for parameter updates, a problem that amplifies instability in RL optimization.

AINeutralarXiv – CS AI · May 116/10

🧠

An Interpretable and Scalable Framework for Evaluating Large Language Models

Researchers introduce a scalable framework for evaluating large language models using Item Response Theory and majorization-minimization algorithms, achieving orders-of-magnitude speedups while improving interpretability. The method addresses computational limitations of traditional benchmarking approaches and provides insights into model abilities and benchmark item characteristics.

AINeutralarXiv – CS AI · May 116/10

🧠

DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation

Researchers propose DCGL, a dual-channel graph learning framework that combines Knowledge Graphs with Large Language Models to improve recommendation systems. The method addresses limitations in current approaches by separately modeling semantic and behavioral patterns, using contrastive learning and adaptive fusion to achieve better performance across sparse and active user scenarios.

AINeutralarXiv – CS AI · May 116/10

🧠

KL for a KL: On-Policy Distillation with Control Variate Baseline

Researchers propose vOPD (On-Policy Distillation with control variate baseline), a stabilization technique for training large language models that reduces gradient variance without adding computational overhead. The method leverages reinforcement learning principles to make on-policy distillation more reliable and efficient, matching expensive full-vocabulary baselines while maintaining lightweight single-sample estimation.

AINeutralarXiv – CS AI · May 115/10

🧠

FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations

Researchers propose FiSMiness, a framework integrating Finite State Machines with large language models to improve emotional support conversations by enabling models to systematically reason through emotional states, support strategies, and responses. The approach outperforms multiple baseline methods including chain-of-thought and fine-tuning approaches on ESC datasets, demonstrating that structured reasoning paradigms can enhance LLM performance on specialized dialogue tasks.

AINeutralarXiv – CS AI · May 116/10

🧠

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

VIDEE is a new system that enables entry-level data analysts to perform advanced text analytics using intelligent AI agents without specialized NLP knowledge. The platform combines human-in-the-loop decision-making with LLM-powered execution and evaluation, demonstrated through quantitative experiments and user studies showing effectiveness across experience levels.

AINeutralarXiv – CS AI · May 116/10

🧠

Replicating Human Motivated Reasoning Studies with LLMs

Researchers found that base large language models do not replicate human motivated reasoning patterns when tested across four political studies. Unlike humans who adjust their reasoning based on desired conclusions, LLMs show different behavioral patterns, raising concerns about using these models for opinion simulation and argument assessment tasks.

AINeutralarXiv – CS AI · May 116/10

🧠

Discovering Multiagent Learning Algorithms with Large Language Models

Researchers deployed AlphaEvolve, an LLM-powered evolutionary coding framework, to automatically discover new multi-agent reinforcement learning algorithms for imperfect-information games. The system produced two competitive algorithms (VAD-CFR and SHOR-PSRO) that match human-designed baselines, but further analysis revealed that distilled, minimal versions (WOP-CFR and PM-PSRO) generalize better with simpler structures, demonstrating that LLM-discovered complexity often obscures fundamental algorithmic principles.

AINeutralarXiv – CS AI · May 96/10

🧠

SCRuB: Social Concept Reasoning under Rubric-Based Evaluation

Researchers introduce SCRuB, a novel evaluation framework for measuring how well large language models reason about social concepts—abstract ideas underlying norms, culture, and institutions. Testing frontier models against PhD-level experts on 4,711 prompts, the study finds AI models outperform human experts across all dimensions, with models preferred in 74.4% of comparative judgments, suggesting evaluation saturation in single-turn reasoning tasks.

AINeutralarXiv – CS AI · May 96/10

🧠

Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review

Researchers propose IntraGuard, a defense framework that embeds hidden safeguards into PDF manuscripts to detect when AI chatbots are used to generate peer reviews instead of human experts. The system achieves 84% success rate in disrupting AI-generated reviews while maintaining transparency for legitimate human reviewers, addressing growing concerns about academic integrity as LLMs proliferate.

AINeutralarXiv – CS AI · May 96/10

🧠

Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems

Researchers propose an active learning framework for optimizing communication structures in multi-agent systems powered by large language models, using ensemble-based task selection to identify the most informative training tasks while reducing token consumption and computational costs.

AINeutralarXiv – CS AI · May 96/10

🧠

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Researchers demonstrate that using the same optimizer during both pretraining and finetuning of large language models reduces catastrophic forgetting while maintaining or improving task performance. This "optimizer-model consistency" effect suggests optimizers create regularization patterns that preserve learned knowledge, with implications for efficient model adaptation strategies.

AINeutralarXiv – CS AI · May 96/10

🧠

On the optimization dynamics of RLVR: Gradient gap and step size thresholds

Researchers provide theoretical foundations for Reinforcement Learning with Verifiable Rewards (RLVR), a technique for post-training large language models using binary feedback. The analysis introduces the 'Gradient Gap' concept to explain convergence dynamics and derives critical step-size thresholds that determine whether training succeeds or fails, with implications for practical implementations like length normalization.

AINeutralarXiv – CS AI · May 96/10

🧠

Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation

Researchers introduce CoNL, a framework that enables large language models to improve themselves through multi-agent self-play without requiring ground-truth labels or external judges. The system uses critiques that successfully improve solutions as training signals, allowing models to jointly optimize both generation and evaluation capabilities for non-verifiable tasks like creative writing and ethical reasoning.

AINeutralarXiv – CS AI · May 76/10

🧠

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

Researchers introduce Strat-Reasoner, an RL-based framework that enhances large language models' strategic reasoning in multi-agent game environments by integrating recursive reasoning across all agents and employing centralized evaluation. The approach demonstrates 22.1% average performance improvements, addressing a critical limitation where LLMs struggle with non-stationary multi-agent dynamics.

AINeutralarXiv – CS AI · May 76/10

🧠

Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop

Researchers developed a Personalized Thinking Model (PTM) that creates 'cognitive twins' of learners by organizing educational data into a five-layer hierarchical structure using AI and machine learning. The system achieved 74-75% fidelity scores and positive user perception ratings, suggesting potential applications in AI-supported education systems.

🧠 Gemini

AINeutralarXiv – CS AI · May 76/10

🧠

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

This research roadmap examines the evolving relationship between search-based software engineering (SBSE) and AI foundation models like large language models, after 25 years of SBSE development. The paper identifies three core integration pathways: using FMs to enhance SBSE techniques, applying SBSE methods to improve FM development, and exploring synergies between both approaches for future software engineering challenges.

AINeutralarXiv – CS AI · May 76/10

🧠

On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training

Researchers prove that supervised fine-tuning (SFT) and reinforcement learning (RL) cannot be decoupled during large language model post-training, as each method degrades the performance gains of the other. The theoretical findings, verified experimentally, challenge the widespread industry practice of alternating these two training approaches and suggest optimal RL duration exists to balance competing objectives.

AINeutralarXiv – CS AI · May 46/10

🧠

A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

A comprehensive survey systematizes Reasoning-Intensive Retrieval (RIR), a rapidly emerging field that integrates Large Language Model reasoning capabilities into information retrieval systems. The study provides the first structured framework organizing RIR benchmarks, methods, and taxonomies to guide future research in this fragmented but high-growth area.

AINeutralarXiv – CS AI · May 46/10

🧠

ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts

Researchers have introduced ViLegalNLI, the first large-scale Vietnamese Natural Language Inference dataset for legal texts, containing 42,012 premise-hypothesis pairs from statutory documents. The dataset enables AI systems to understand legal reasoning patterns and supports development of reliable AI tools for Vietnamese legal analysis and decision-making.

AIBullisharXiv – CS AI · May 46/10

🧠

Space Network of Experts: Architecture and Expert Placement

Researchers present Space-XNet, a framework for efficiently deploying mixture-of-experts language models across satellite constellations using optimized expert placement strategies. The approach achieves a threefold latency reduction compared to conventional methods, addressing key challenges in executing energy-intensive AI workloads in space where computing and communication resources are severely constrained.

← PrevPage 16 of 24Next →