236 articles tagged with #large-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 6d ago6/10
๐ง Researchers introduce Nirvana, a Specialized Generalist Model that combines broad language capabilities with domain-specific adaptation through task-aware memory mechanisms. The model achieves competitive performance on general benchmarks while reaching lowest perplexity across specialized domains like biomedicine, finance, and law, with practical applications demonstrated in medical imaging reconstruction.
๐ข Hugging Face๐ข Perplexity
AIBullisharXiv โ CS AI ยท Apr 76/10
๐ง Researchers introduce InferenceEvolve, an AI framework using large language models to automatically discover and refine causal inference methods. The system outperformed 58 human submissions in a recent competition and demonstrates how AI can optimize complex scientific programs through evolutionary approaches.
AIBullisharXiv โ CS AI ยท Apr 76/10
๐ง Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.
AIBearisharXiv โ CS AI ยท Apr 66/10
๐ง A new research study reveals that Audio-Visual Large Language Models (AVLLMs) exhibit a fundamental bias toward visual information over audio when the modalities conflict. The research shows that while these models encode rich audio semantics in intermediate layers, visual representations dominate during the final text generation phase, indicating limited effectiveness of current multimodal AI training approaches.
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers developed enhanced techniques using Few-Shot Learning, Chain-of-Thought reasoning, and Retrieval Augmented Generation to improve large language models' ability to detect and repair errors in MPI programs. The approach increased error detection accuracy from 44% to 77% compared to using ChatGPT directly, addressing challenges in maintaining high-performance computing applications used in machine learning frameworks.
๐ง ChatGPT
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers propose Rubrics to Tokens (RTT), a novel reinforcement learning framework that improves Large Language Model alignment by bridging response-level and token-level rewards. The method addresses reward sparsity and ambiguity issues in instruction-following tasks through fine-grained credit assignment and demonstrates superior performance across different models.
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers developed a framework using large language models (LLMs) as adaptive controllers for SIMP topology optimization, replacing fixed-schedule continuation with real-time parameter adjustments. The LLM agent achieved 5.7% to 18.1% better performance than baseline methods across multiple 2D and 3D engineering problems.
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers introduce QuatRoPE, a novel positional embedding method that improves 3D spatial reasoning in Large Language Models by encoding object relations more efficiently. The method maintains linear scalability with the number of objects and preserves LLMs' original capabilities through the Isolated Gated RoPE Extension.
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers developed a framework integrating large language models with knowledge graphs to provide programming feedback and exercise recommendations. The hybrid GenAI-adaptive approach outperformed traditional adaptive learning and GenAI-only modes, producing more correct code submissions and fewer incomplete attempts across 4,956 code submissions.
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers propose combining large language models (LLMs) with combinatorial inference to address hallucinations and improve structured prediction accuracy. The study finds that incorporating symbolic inference yields more consistent predictions than prompting alone, with calibration and fine-tuning further enhancing performance on complex tasks.
AINeutralarXiv โ CS AI ยท Mar 266/10
๐ง Research reveals that large language models fail to follow formatting instructions 2-21% more often when performing complex tasks simultaneously, with terminal constraints showing up to 50% degradation. Enhanced formatting with explicit framing and reminders can restore compliance to 90-100% in most cases.
AIBullisharXiv โ CS AI ยท Mar 266/10
๐ง Researchers propose MixDemo, a new GraphRAG framework that uses a Mixture-of-Experts mechanism to select high-quality demonstrations for improving large language model performance in domain-specific question answering. The framework includes a query-specific graph encoder to reduce noise in retrieved subgraphs and significantly outperforms existing methods across multiple textual graph benchmarks.
AIBullisharXiv โ CS AI ยท Mar 266/10
๐ง Researchers propose Dual Guidance Optimization (DGO), a new framework that improves large language model training by combining external experience banks with internal knowledge to better mimic human learning patterns. The approach shows consistent improvements over existing reinforcement learning methods for reasoning tasks.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose GRPO (Group Relative Policy Optimization) combined with reflection reward mechanisms to enhance mathematical reasoning in large language models. The four-stage framework encourages self-reflective capabilities during training and demonstrates state-of-the-art performance over existing methods like supervised fine-tuning and LoRA.
AINeutralarXiv โ CS AI ยท Mar 176/10
๐ง Research reveals that Large Language Models struggle with dynamic Theory of Mind tasks, particularly tracking how others' beliefs change over time. While LLMs can infer current beliefs effectively, they fail to maintain and retrieve prior belief states after updates occur, showing patterns consistent with human cognitive biases.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce Pragma-VL, a new alignment algorithm for Multimodal Large Language Models that balances safety and helpfulness by improving visual risk perception and using contextual arbitration. The method outperforms existing baselines by 5-20% on multimodal safety benchmarks while maintaining general AI capabilities in mathematics and reasoning.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose a new early-exit method for Large Reasoning Language Models that detects and prevents overthinking by monitoring high-entropy transition tokens that indicate deviation from correct reasoning paths. The method improves performance and efficiency compared to existing approaches without requiring additional training overhead or limiting inference throughput.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce Decoupled Gradient Policy Optimization (DGPO), a new reinforcement learning method that improves large language model training by using probability gradients instead of log-probability gradients. The technique addresses instability issues in current methods while maintaining exploration capabilities, showing superior performance across mathematical benchmarks.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose a new framework for large language models that separates planning from factual retrieval to improve reliability in fact-seeking question answering. The modular approach uses a lightweight student planner trained via teacher-student learning to generate structured reasoning steps, showing improved accuracy and speed on challenging benchmarks.
AIBearisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers warn that AI-powered conversational navigation systems using Large Language Models could transform route guidance from verifiable geometric tasks into manipulative dialogues. The study proposes a framework categorizing risks as dark patterns or explainability pitfalls, suggesting neuro-symbolic architectures to maintain trustworthiness.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose a theoretical framework based on category theory to formalize meta-prompting in large language models. The study demonstrates that meta-prompting (using prompts to generate other prompts) is more effective than basic prompting for generating desirable outputs from LLMs.
AINeutralarXiv โ CS AI ยท Mar 166/10
๐ง Researchers introduce Budget-Sensitive Discovery Score (BSDS), a formally verified framework for evaluating AI-guided scientific candidate selection under budget constraints. Testing on drug discovery datasets reveals that simple random forest models outperform large language models, with LLMs providing no marginal value over existing trained classifiers.
AINeutralarXiv โ CS AI ยท Mar 166/10
๐ง This comprehensive survey examines continual learning methodologies for large language models, focusing on three core training stages and methods to mitigate catastrophic forgetting. The research reveals that while current approaches show promise in specific domains, fundamental challenges remain in achieving seamless knowledge integration across diverse tasks and temporal scales.
AIBullisharXiv โ CS AI ยท Mar 166/10
๐ง Researchers propose MetaKE, a new framework for knowledge editing in Large Language Models that addresses the 'Semantic-Execution Disconnect' through bi-level optimization. The method treats edit targets as learnable parameters and uses a Structural Gradient Proxy to align edits with the model's feasible manifold, showing significant improvements over existing approaches.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers developed a lightweight AI framework for the Game of the Amazons that combines graph attention networks with large language models, achieving 15-56% improvement in decision accuracy while using minimal computational resources. The hybrid approach demonstrates weak-to-strong generalization by leveraging GPT-4o-mini for synthetic training data and graph-based learning for structural reasoning.
๐ง GPT-4