236 articles tagged with #large-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers introduce SearchLLM, the first large language model designed for open-ended generative search, featuring a hierarchical reward system that balances safety constraints with user alignment. The model was deployed on RedNote's AI search platform, showing significant improvements in user engagement with a 1.03% increase in Valid Consumption Rate and 2.81% reduction in Re-search Rate.
AINeutralarXiv โ CS AI ยท Mar 126/10
๐ง Researchers propose HIR-SDD, a new framework combining Large Audio Language Models with human-inspired reasoning to detect speech deepfakes. The method aims to improve generalization across different audio domains and provide interpretable explanations for deepfake detection decisions.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers propose Dynamics-Predictive Sampling (DPS), a new method that improves reinforcement learning finetuning of large language models by predicting which training prompts will be most informative without expensive computational rollouts. The technique models each prompt's learning progress as a dynamical system and uses Bayesian inference to select better training data, reducing computational overhead while achieving superior reasoning performance.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduce Social-R1, a reinforcement learning framework that enhances social reasoning in large language models by training on adversarial examples. The approach enables a 4B parameter model to outperform larger models across eight benchmarks by supervising the entire reasoning process rather than just outcomes.
AIBearisharXiv โ CS AI ยท Mar 116/10
๐ง A new research study reveals that Large Language Models (LLMs) propagate gender stereotypes and biases when processing healthcare data, particularly through interactions between gender and social determinants of health. The research used French patient records to demonstrate how LLMs rely on embedded stereotypes to make gendered decisions in healthcare contexts.
AINeutralarXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduce SCENEBench, a new benchmark for evaluating Large Audio Language Models (LALMs) beyond speech recognition, focusing on real-world audio understanding including background sounds, noise localization, and vocal characteristics. Testing of five state-of-the-art models revealed significant performance gaps, with some tasks performing below random chance while others achieved high accuracy.
AINeutralarXiv โ CS AI ยท Mar 96/10
๐ง Researchers propose a schema-gated orchestration approach to resolve the trade-off between conversational flexibility and deterministic execution in AI-driven scientific workflows. Their analysis of 20 systems reveals no current solution achieves both high flexibility and determinism, but identifies a convergence zone for potential breakthrough architectures.
AIBullisharXiv โ CS AI ยท Mar 96/10
๐ง A comprehensive survey examines how large multimodal language models are transforming scientific research across five key areas: literature search, idea generation, content creation, multimodal artifact production, and peer review evaluation. The research highlights both the potential for AI-assisted scientific discovery and the ethical concerns regarding research integrity and misuse of generative models.
AIBullisharXiv โ CS AI ยท Mar 65/10
๐ง Researchers propose K-Gen, a new multimodal AI framework that uses Large Language Models to generate realistic driving trajectories for autonomous vehicle simulation. The system combines visual map data with text descriptions to create interpretable keypoints that guide trajectory generation, outperforming existing baselines on major datasets.
AIBullisharXiv โ CS AI ยท Mar 55/10
๐ง Researchers at the Australian National University developed a semantic query processing system that combines Large Language Models with a scholarly Knowledge Graph to enable comprehensive information retrieval about computer science research. The system uses the Deep Document Model for fine-grained document representation and KG-enhanced Query Processing for optimized query handling, showing superior accuracy and efficiency compared to baseline methods.
AINeutralarXiv โ CS AI ยท Mar 55/10
๐ง Taobao has developed REVISION, a new AI framework that combines large language models with traditional e-commerce visual search systems to better understand implicit user intents and reduce no-click search rates. The system uses offline analysis of historical search data and online reasoning to adaptively optimize search results and platform strategies.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers demonstrate that Group Relative Policy Optimization (GRPO), traditionally viewed as an on-policy reinforcement learning algorithm, can be reinterpreted as an off-policy algorithm through first-principles analysis. This theoretical breakthrough provides new insights for optimizing reinforcement learning applications in large language models and offers principled approaches for off-policy RL algorithm design.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers propose Online Causal Kalman Filtering for Policy Optimization (KPO) to address high-variance instability in reinforcement learning for large language models. The method uses Kalman filtering to smooth token-level importance sampling ratios, preventing training collapse and achieving superior results on math reasoning tasks.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduced InterSyn, a 1.8M sample dataset designed to improve Large Multimodal Models' ability to generate interleaved image-text content. The dataset includes a new evaluation framework called SynJudge that measures four key performance metrics, with experiments showing significant improvements even with smaller 25K-50K sample subsets.
AIBearisharXiv โ CS AI ยท Mar 36/104
๐ง A new research study analyzes how Large Language Models are impacting Wikipedia content and structure, finding approximately 1% influence in certain categories. The research warns of potential risks to AI benchmarks and natural language processing tasks if Wikipedia becomes contaminated by LLM-generated content.
AINeutralarXiv โ CS AI ยท Mar 37/108
๐ง Researchers have developed DIVA-GRPO, a new reinforcement learning method that improves multimodal large language model reasoning by adaptively adjusting problem difficulty distributions. The approach addresses key limitations in existing group relative policy optimization methods, showing superior performance across six reasoning benchmarks.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers have developed FCN-LLM, a framework that enables Large Language Models to understand brain functional connectivity networks from fMRI scans through multi-task instruction tuning. The system uses a multi-scale encoder to capture brain features and demonstrates strong zero-shot generalization across unseen datasets, outperforming conventional supervised models.
AINeutralarXiv โ CS AI ยท Mar 37/109
๐ง Researchers propose the Lattice Representation Hypothesis, a new framework showing how large language models encode symbolic reasoning through geometric structures. The theory unifies continuous neural representations with formal logic by demonstrating that LLM embeddings naturally form concept lattices that enable symbolic operations through geometric intersections and unions.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers introduce ROSA2, a framework that improves Large Language Model interactions by simultaneously optimizing both prompts and model parameters during test-time adaptation. The approach outperformed baselines by 30% on mathematical tasks while reducing interaction turns by 40%.
AIBullisharXiv โ CS AI ยท Mar 37/106
๐ง GraphScout is a new AI framework that enables smaller language models to autonomously explore knowledge graphs for reasoning tasks. The system allows a 4B parameter model to outperform much larger models by 16.7% while using fewer computational resources.
AINeutralarXiv โ CS AI ยท Mar 36/107
๐ง A research study evaluated how four major large language models (GPT-5.2, Claude 4.5 Sonnet, Gemini 3 Pro, and DeepSeek-R1) respond to patient preferences in clinical decision-making scenarios. While all models acknowledged patient values, they showed modest actual recommendation shifting with value sensitivity indices ranging from 0.13 to 0.27, revealing gaps in how AI systems incorporate patient preferences into medical recommendations.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers introduce CHIMERA, a compact 9K-sample synthetic dataset that enables smaller AI models to achieve reasoning performance comparable to much larger models. The dataset addresses key challenges in training reasoning-capable LLMs through automated generation and cross-validation across 8 scientific disciplines.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers developed a new mathematical framework called Curvature-Weighted Capacity Allocation that optimizes large language model performance by identifying which layers contribute most to loss reduction. The method uses the Minimum Description Length principle to make principled decisions about layer pruning and capacity allocation under hardware constraints.
$NEAR
AIBullisharXiv โ CS AI ยท Mar 37/106
๐ง Researchers propose Attention Smoothing Unlearning (ASU), a new framework that helps Large Language Models forget sensitive or copyrighted content without losing overall performance. The method uses self-distillation and attention smoothing to erase specific knowledge while maintaining coherent responses, outperforming existing unlearning techniques.
AINeutralarXiv โ CS AI ยท Mar 36/108
๐ง New theoretical research analyzes how Large Language Models learn during pretraining versus post-training phases, revealing that balanced pretraining data creates latent capabilities activated later, while supervised fine-tuning works best on small, challenging datasets and reinforcement learning requires large-scale data that isn't overly difficult.