y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#large-language-models News & Analysis

236 articles tagged with #large-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

236 articles
AIBullisharXiv โ€“ CS AI ยท 6d ago6/10
๐Ÿง 

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

Researchers introduce Nirvana, a Specialized Generalist Model that combines broad language capabilities with domain-specific adaptation through task-aware memory mechanisms. The model achieves competitive performance on general benchmarks while reaching lowest perplexity across specialized domains like biomedicine, finance, and law, with practical applications demonstrated in medical imaging reconstruction.

๐Ÿข Hugging Face๐Ÿข Perplexity
AIBullisharXiv โ€“ CS AI ยท Apr 76/10
๐Ÿง 

InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

Researchers introduce InferenceEvolve, an AI framework using large language models to automatically discover and refine causal inference methods. The system outperformed 58 human submissions in a recent competition and demonstrates how AI can optimize complex scientific programs through evolutionary approaches.

AIBullisharXiv โ€“ CS AI ยท Apr 76/10
๐Ÿง 

REAM: Merging Improves Pruning of Experts in LLMs

Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.

AIBearisharXiv โ€“ CS AI ยท Apr 66/10
๐Ÿง 

Do Audio-Visual Large Language Models Really See and Hear?

A new research study reveals that Audio-Visual Large Language Models (AVLLMs) exhibit a fundamental bias toward visual information over audio when the modalities conflict. The research shows that while these models encode rich audio semantics in intermediate layers, visual representations dominate during the final text generation phase, indicating limited effectiveness of current multimodal AI training approaches.

AIBullisharXiv โ€“ CS AI ยท Apr 66/10
๐Ÿง 

Improving MPI Error Detection and Repair with Large Language Models and Bug References

Researchers developed enhanced techniques using Few-Shot Learning, Chain-of-Thought reasoning, and Retrieval Augmented Generation to improve large language models' ability to detect and repair errors in MPI programs. The approach increased error detection accuracy from 44% to 77% compared to using ChatGPT directly, addressing challenges in maintaining high-performance computing applications used in machine learning frameworks.

๐Ÿง  ChatGPT
AIBullisharXiv โ€“ CS AI ยท Apr 66/10
๐Ÿง 

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Researchers propose Rubrics to Tokens (RTT), a novel reinforcement learning framework that improves Large Language Model alignment by bridging response-level and token-level rewards. The method addresses reward sparsity and ambiguity issues in instruction-following tasks through fine-grained credit assignment and demonstrates superior performance across different models.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Researchers introduce QuatRoPE, a novel positional embedding method that improves 3D spatial reasoning in Large Language Models by encoding object relations more efficiently. The method maintains linear scalability with the number of objects and preserves LLMs' original capabilities through the Isolated Gated RoPE Extension.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

Researchers developed a framework integrating large language models with knowledge graphs to provide programming feedback and exercise recommendations. The hybrid GenAI-adaptive approach outperformed traditional adaptive learning and GenAI-only modes, producing more correct code submissions and fewer incomplete attempts across 4,956 code submissions.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Mapping the Course for Prompt-based Structured Prediction

Researchers propose combining large language models (LLMs) with combinatorial inference to address hallucinations and improve structured prediction accuracy. The study finds that incorporating symbolic inference yields more consistent predictions than prompting alone, with calibration and fine-tuning further enhancing performance on complex tasks.

AINeutralarXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

Did You Forget What I Asked? Prospective Memory Failures in Large Language Models

Research reveals that large language models fail to follow formatting instructions 2-21% more often when performing complex tasks simultaneously, with terminal constraints showing up to 50% degradation. Enhanced formatting with explicit framing and reminders can restore compliance to 90-100% in most cases.

AIBullisharXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

Mixture of Demonstrations for Textual Graph Understanding and Question Answering

Researchers propose MixDemo, a new GraphRAG framework that uses a Mixture-of-Experts mechanism to select high-quality demonstrations for improving large language model performance in domain-specific question answering. The framework includes a query-specific graph encoder to reduce noise in retrieved subgraphs and significantly outperforms existing methods across multiple textual graph benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

Researchers propose Dual Guidance Optimization (DGO), a new framework that improves large language model training by combining external experience banks with internal knowledge to better mimic human learning patterns. The approach shows consistent improvements over existing reinforcement learning methods for reasoning tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models

Researchers propose GRPO (Group Relative Policy Optimization) combined with reflection reward mechanisms to enhance mathematical reasoning in large language models. The four-stage framework encourages self-reflective capabilities during training and demonstrates state-of-the-art performance over existing methods like supervised fine-tuning and LoRA.

AINeutralarXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models

Research reveals that Large Language Models struggle with dynamic Theory of Mind tasks, particularly tracking how others' beliefs change over time. While LLMs can infer current beliefs effectively, they fail to maintain and retrieve prior belief states after updates occur, showing patterns consistent with human cognitive biases.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Researchers introduce Pragma-VL, a new alignment algorithm for Multimodal Large Language Models that balances safety and helpfulness by improving visual risk perception and using contextual arbitration. The method outperforms existing baselines by 5-20% on multimodal safety benchmarks while maintaining general AI capabilities in mathematics and reasoning.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring

Researchers propose a new early-exit method for Large Reasoning Language Models that detects and prevents overthinking by monitoring high-entropy transition tokens that indicate deviation from correct reasoning paths. The method improves performance and efficiency compared to existing approaches without requiring additional training overhead or limiting inference throughput.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

From $\boldsymbol{\log\pi}$ to $\boldsymbol{\pi}$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight

Researchers introduce Decoupled Gradient Policy Optimization (DGPO), a new reinforcement learning method that improves large language model training by using probability gradients instead of log-probability gradients. The technique addresses instability issues in current methods while maintaining exploration capabilities, showing superior performance across mathematical benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Distilling Reasoning Without Knowledge: A Framework for Reliable LLMs

Researchers propose a new framework for large language models that separates planning from factual retrieval to improve reliability in fact-seeking question answering. The modular approach uses a lightweight student planner trained via teacher-student learning to generate structured reasoning steps, showing improved accuracy and speed on challenging benchmarks.

AIBearisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

The Scenic Route to Deception: Dark Patterns and Explainability Pitfalls in Conversational Navigation

Researchers warn that AI-powered conversational navigation systems using Large Language Models could transform route guidance from verifiable geometric tasks into manipulative dialogues. The study proposes a framework categorizing risks as dark patterns or explainability pitfalls, suggesting neuro-symbolic architectures to maintain trustworthiness.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

On Meta-Prompting

Researchers propose a theoretical framework based on category theory to formalize meta-prompting in large language models. The study demonstrates that meta-prompting (using prompts to generate other prompts) is more effective than basic prompting for generating desirable outputs from LLMs.

AINeutralarXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection

Researchers introduce Budget-Sensitive Discovery Score (BSDS), a formally verified framework for evaluating AI-guided scientific candidate selection under budget constraints. Testing on drug discovery datasets reveals that simple random forest models outperform large language models, with LLMs providing no marginal value over existing trained classifiers.

AINeutralarXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

This comprehensive survey examines continual learning methodologies for large language models, focusing on three core training stages and methods to mitigate catastrophic forgetting. The research reveals that while current approaches show promise in specific domains, fundamental challenges remain in achieving seamless knowledge integration across diverse tasks and temporal scales.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization

Researchers propose MetaKE, a new framework for knowledge editing in Large Language Models that addresses the 'Semantic-Execution Disconnect' through bi-level optimization. The method treats edit targets as learnable parameters and uses a Structural Gradient Proxy to align edits with the model's feasible manifold, showing significant improvements over existing approaches.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

Researchers developed a lightweight AI framework for the Game of the Amazons that combines graph attention networks with large language models, achieving 15-56% improvement in decision accuracy while using minimal computational resources. The hybrid approach demonstrates weak-to-strong generalization by leveraging GPT-4o-mini for synthetic training data and graph-based learning for structural reasoning.

๐Ÿง  GPT-4