y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#large-language-models News & Analysis

236 articles tagged with #large-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

236 articles
AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Aligning Large Language Models with Searcher Preferences

Researchers introduce SearchLLM, the first large language model designed for open-ended generative search, featuring a hierarchical reward system that balances safety constraints with user alignment. The model was deployed on RedNote's AI search platform, showing significant improvements in user engagement with a 1.03% increase in Valid Consumption Rate and 2.81% reduction in Re-search Rate.

AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning

Researchers propose HIR-SDD, a new framework combining Large Audio Language Models with human-inspired reasoning to detect speech deepfakes. The method aims to improve generalization across different audio domains and provide interpretable explanations for deepfake detection decisions.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models

Researchers propose Dynamics-Predictive Sampling (DPS), a new method that improves reinforcement learning finetuning of large language models by predicting which training prompts will be most informative without expensive computational rollouts. The technique models each prompt's learning progress as a dynamical system and uses Bayesian inference to select better training data, reducing computational overhead while achieving superior reasoning performance.

AIBullisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

Social-R1: Towards Human-like Social Reasoning in LLMs

Researchers introduce Social-R1, a reinforcement learning framework that enhances social reasoning in large language models by training on adversarial examples. The approach enables a 4B parameter model to outperform larger models across eight benchmarks by supervising the entire reasoning process rather than just outcomes.

AIBearisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health

A new research study reveals that Large Language Models (LLMs) propagate gender stereotypes and biases when processing healthcare data, particularly through interactions between gender and social determinants of health. The research used French patient records to demonstrate how LLMs rely on embedded stereotypes to make gendered decisions in healthcare contexts.

AINeutralarXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Researchers introduce SCENEBench, a new benchmark for evaluating Large Audio Language Models (LALMs) beyond speech recognition, focusing on real-world audio understanding including background sounds, noise localization, and vocal characteristics. Testing of five state-of-the-art models revealed significant performance gaps, with some tasks performing below random chance while others achieved high accuracy.

AINeutralarXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

Researchers propose a schema-gated orchestration approach to resolve the trade-off between conversational flexibility and deterministic execution in AI-driven scientific workflows. Their analysis of 20 systems reveals no current solution achieves both high flexibility and determinism, but identifies a convergence zone for potential breakthrough architectures.

AIBullisharXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

A comprehensive survey examines how large multimodal language models are transforming scientific research across five key areas: literature search, idea generation, content creation, multimodal artifact production, and peer review evaluation. The research highlights both the potential for AI-assisted scientific discovery and the ethical concerns regarding research integrity and misuse of generative models.

AIBullisharXiv โ€“ CS AI ยท Mar 65/10
๐Ÿง 

K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

Researchers propose K-Gen, a new multimodal AI framework that uses Large Language Models to generate realistic driving trajectories for autonomous vehicle simulation. The system combines visual map data with text descriptions to create interpretable keypoints that guide trajectory generation, outperforming existing baselines on major datasets.

AIBullisharXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph

Researchers at the Australian National University developed a semantic query processing system that combines Large Language Models with a scholarly Knowledge Graph to enable comprehensive information retrieval about computer science research. The system uses the Deep Document Model for fine-grained document representation and KG-enhanced Query Processing for optimized query handling, showing superior accuracy and efficiency compared to baseline methods.

AINeutralarXiv โ€“ CS AI ยท Mar 55/10
๐Ÿง 

REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce Visual Search System Optimization

Taobao has developed REVISION, a new AI framework that combines large language models with traditional e-commerce visual search systems to better understand implicit user intents and reduce no-click search rates. The system uses offline analysis of historical search data and online reasoning to adaptively optimize search results and platform strategies.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

Researchers demonstrate that Group Relative Policy Optimization (GRPO), traditionally viewed as an on-policy reinforcement learning algorithm, can be reinterpreted as an off-policy algorithm through first-principles analysis. This theoretical breakthrough provides new insights for optimizing reinforcement learning applications in large language models and offers principled approaches for off-policy RL algorithm design.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Researchers propose Online Causal Kalman Filtering for Policy Optimization (KPO) to address high-variance instability in reinforcement learning for large language models. The method uses Kalman filtering to smooth token-level importance sampling ratios, preventing training collapse and achieving superior results on math reasoning tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

Researchers introduced InterSyn, a 1.8M sample dataset designed to improve Large Multimodal Models' ability to generate interleaved image-text content. The dataset includes a new evaluation framework called SynJudge that measures four key performance metrics, with experiments showing significant improvements even with smaller 25K-50K sample subsets.

AIBearisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Wikipedia in the Era of LLMs: Evolution and Risks

A new research study analyzes how Large Language Models are impacting Wikipedia content and structure, finding approximately 1% influence in certain categories. The research warns of potential risks to AI benchmarks and natural language processing tasks if Wikipedia becomes contaminated by LLM-generated content.

AINeutralarXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage

Researchers have developed DIVA-GRPO, a new reinforcement learning method that improves multimodal large language model reasoning by adaptively adjusting problem difficulty distributions. The approach addresses key limitations in existing group relative policy optimization methods, showing superior performance across six reasoning benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

FCN-LLM: Empower LLM for Brain Functional Connectivity Network Understanding via Graph-level Multi-task Instruction Tuning

Researchers have developed FCN-LLM, a framework that enables Large Language Models to understand brain functional connectivity networks from fMRI scans through multi-task instruction tuning. The system uses a multi-scale encoder to capture brain features and demonstrates strong zero-shot generalization across unseen datasets, outperforming conventional supervised models.

AINeutralarXiv โ€“ CS AI ยท Mar 37/109
๐Ÿง 

The Lattice Representation Hypothesis of Large Language Models

Researchers propose the Lattice Representation Hypothesis, a new framework showing how large language models encode symbolic reasoning through geometric structures. The theory unifies continuous neural representations with formal logic by demonstrating that LLM embeddings naturally form concept lattices that enable symbolic operations through geometric intersections and unions.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

Researchers introduce ROSA2, a framework that improves Large Language Model interactions by simultaneously optimizing both prompts and model parameters during test-time adaptation. The approach outperformed baselines by 30% on mathematical tasks while reducing interaction turns by 40%.

AINeutralarXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

The Value Sensitivity Gap: How Clinical Large Language Models Respond to Patient Preference Statements in Shared Decision-Making

A research study evaluated how four major large language models (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) respond to patient preferences in clinical decision-making scenarios. While all models acknowledged patient values, they showed modest actual recommendation shifting with value sensitivity indices ranging from 0.13 to 0.27, revealing gaps in how AI systems incorporate patient preferences into medical recommendations.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Researchers introduce CHIMERA, a compact 9K-sample synthetic dataset that enables smaller AI models to achieve reasoning performance comparable to much larger models. The dataset addresses key challenges in training reasoning-capable LLMs through automated generation and cross-validation across 8 scientific disciplines.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Researchers developed a new mathematical framework called Curvature-Weighted Capacity Allocation that optimizes large language model performance by identifying which layers contribute most to loss reduction. The method uses the Minimum Description Length principle to make principled decisions about layer pruning and capacity allocation under hardware constraints.

$NEAR
AIBullisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

Attention Smoothing Is All You Need For Unlearning

Researchers propose Attention Smoothing Unlearning (ASU), a new framework that helps Large Language Models forget sensitive or copyrighted content without losing overall performance. The method uses self-distillation and attention smoothing to erase specific knowledge while maintaining coherent responses, outperforming existing unlearning techniques.

AINeutralarXiv โ€“ CS AI ยท Mar 36/108
๐Ÿง 

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models

New theoretical research analyzes how Large Language Models learn during pretraining versus post-training phases, revealing that balanced pretraining data creates latent capabilities activated later, while supervised fine-tuning works best on small, challenging datasets and reinforcement learning requires large-scale data that isn't overly difficult.