188 articles tagged with #large-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv β CS AI Β· Mar 97/10
π§ Researchers introduce RM-R1, a new class of Reasoning Reward Models (ReasRMs) that integrate chain-of-thought reasoning into reward modeling for large language models. The models outperform much larger competitors including GPT-4o by up to 4.9% across reward model benchmarks by using a chain-of-rubrics mechanism and two-stage training process.
π§ GPT-4π§ Llama
AIBullisharXiv β CS AI Β· Mar 97/10
π§ Researchers introduce COLD-Steer, a training-free framework that enables efficient control of large language model behavior at inference time using just a few examples. The method approximates gradient descent effects without parameter updates, achieving 95% steering effectiveness while using 50 times fewer samples than existing approaches.
AINeutralarXiv β CS AI Β· Mar 67/10
π§ Researchers introduce BioLLMAgent, a hybrid framework combining reinforcement learning models with large language models to simulate human decision-making in computational psychiatry. The framework demonstrates strong interpretability while accurately reproducing human behavioral patterns and successfully simulating cognitive behavioral therapy principles.
AINeutralarXiv β CS AI Β· Mar 57/10
π§ Researchers studied reinforcement learning with verifiable rewards (RLVR) for training large language models on causal reasoning tasks, finding it outperforms supervised fine-tuning but only when models have sufficient initial competence. The study used causal graphical models as a testbed and showed RLVR improves specific reasoning subskills like marginalization strategy and probability calculations.
AIBullisharXiv β CS AI Β· Mar 56/10
π§ Researchers developed a new AI-powered framework for crystal structure prediction that uses large language models and symmetry-driven generation to overcome computational bottlenecks. The approach achieves state-of-the-art performance in discovering new materials without relying on existing databases, potentially accelerating materials science research.
AINeutralarXiv β CS AI Β· Mar 57/10
π§ Researchers studied how large language models generalize to new tasks through "off-by-one addition" experiments, discovering a "function induction" mechanism that operates at higher abstraction levels than previously known induction heads. The study reveals that multiple attention heads work in parallel to enable task-level generalization, with this mechanism being reusable across various synthetic and algorithmic tasks.
AIBullisharXiv β CS AI Β· Mar 56/10
π§ Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.
π’ Hugging Faceπ§ GPT-4
AIBullisharXiv β CS AI Β· Mar 57/10
π§ Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.
π§ Llama
AIBullisharXiv β CS AI Β· Mar 57/10
π§ Researchers developed a quantum-inspired self-attention (QISA) mechanism and integrated it into GPT-1's language modeling pipeline, marking the first such integration in autoregressive language models. The QISA mechanism demonstrated significant performance improvements over standard self-attention, achieving 15.5x better character error rate and 13x better cross-entropy loss with only 2.6x longer inference time.
AIBullisharXiv β CS AI Β· Mar 56/10
π§ Researchers propose MIND, a reinforcement learning framework that improves AI-powered psychiatric consultation by addressing key challenges in diagnostic accuracy and clinical reasoning. The system uses a Criteria-Grounded Psychiatric Reasoning Bank to provide better clinical support and reduce inquiry drift during multi-turn patient interactions.
AIBullisharXiv β CS AI Β· Mar 57/10
π§ Researchers developed Crab+, a new Audio-Visual Large Language Model that addresses the problem of negative transfer in multi-task learning, where 55% of tasks typically degrade when trained together. The model introduces explicit cooperation mechanisms and achieves positive transfer in 88% of tasks, outperforming both unified and specialized models.
AINeutralarXiv β CS AI Β· Mar 56/10
π§ Researchers introduce CAM-LDS, a new dataset covering 81 cyber attack techniques to improve automated log analysis using Large Language Models. The study shows LLMs can correctly identify attack techniques in about one-third of cases, with adequate performance in another third, demonstrating potential for AI-powered cybersecurity analysis.
AIBullisharXiv β CS AI Β· Mar 46/102
π§ Researchers identified a critical problem in Large Audio-Language Models (LALMs) where audio perception deteriorates during extended reasoning processes. They developed MPARΒ² framework using reinforcement learning, which improved perception performance from 31.74% to 63.51% and achieved 74.59% accuracy on MMAU benchmark.
AIBullisharXiv β CS AI Β· Mar 47/105
π§ Researchers introduce NeuroProlog, a neurosymbolic framework that improves mathematical reasoning in Large Language Models by converting math problems into executable Prolog programs. The multi-task 'Cocktail' training approach shows significant accuracy improvements of 3-5% across different model sizes, with larger models demonstrating better error correction capabilities.
AINeutralarXiv β CS AI Β· Mar 47/103
π§ New research provides theoretical analysis of reinforcement learning's impact on Large Language Model planning capabilities, revealing that RL improves generalization through exploration while supervised fine-tuning may create spurious solutions. The study shows Q-learning maintains output diversity better than policy gradient methods, with findings validated on real-world planning benchmarks.
AIBullisharXiv β CS AI Β· Mar 46/102
π§ Researchers developed GPUTOK, a GPU-accelerated tokenizer for large language models that processes text significantly faster than existing CPU-based solutions. The optimized version shows 1.7x speed improvement over tiktoken and 7.6x over HuggingFace's GPT-2 tokenizer while maintaining output quality.
AIBullisharXiv β CS AI Β· Mar 47/103
π§ Researchers have developed MedLA, a new logic-driven multi-agent AI framework that uses large language models for complex medical reasoning. The system employs multiple AI agents that organize their reasoning into explicit logical trees and engage in structured discussions to resolve inconsistencies and reach consensus on medical questions.
AIBullisharXiv β CS AI Β· Mar 46/104
π§ A large-scale benchmarking study finds that powerful Multimodal Large Language Models (MLLMs) can extract information from business documents using image-only input, potentially eliminating the need for traditional OCR preprocessing. The research demonstrates that well-designed prompts and instructions can further enhance MLLM performance in document processing tasks.
AIBullisharXiv β CS AI Β· Mar 47/102
π§ Researchers introduce Neural Paging, a new architecture that addresses the computational bottleneck of finite context windows in Large Language Models by implementing a hierarchical system that decouples reasoning from memory management. The approach reduces computational complexity from O(NΒ²) to O(NΒ·KΒ²) for long-horizon reasoning tasks, potentially enabling more efficient AI agents.
AIBullisharXiv β CS AI Β· Mar 47/103
π§ Researchers have identified a critical flaw in reinforcement learning fine-tuning of large language models that causes degradation in multi-attempt performance despite improvements in single attempts. Their proposed solution, Diversity-Preserving Hybrid RL (DPH-RL), uses mass-covering f-divergences to maintain model diversity and prevent catastrophic forgetting while improving training efficiency.
AINeutralarXiv β CS AI Β· Mar 47/104
π§ Researchers introduce GraphSSR, a new framework that improves zero-shot graph learning by combining Large Language Models with adaptive subgraph denoising. The system addresses structural noise issues in existing methods through a dynamic 'Sample-Select-Reason' pipeline and reinforcement learning training.
AIBullisharXiv β CS AI Β· Mar 47/104
π§ Researchers propose Many-Shot In-Context Fine-tuning (ManyICL), a novel approach that significantly improves large language model performance by treating multiple in-context examples as supervised training targets rather than just prompts. The method narrows the performance gap between in-context learning and dedicated fine-tuning while reducing catastrophic forgetting issues.
AIBullisharXiv β CS AI Β· Mar 46/102
π§ Researchers introduce Perception-R1, a new approach to enhance multimodal reasoning in large language models by improving visual perception capabilities through reinforcement learning with visual perception rewards. The method achieves state-of-the-art performance on multimodal reasoning benchmarks using only 1,442 training samples.
AIBullisharXiv β CS AI Β· Mar 47/103
π§ Researchers developed a type-aware retrieval-augmented generation (RAG) method that translates natural language requirements into solver-executable optimization code for industrial applications. The method uses a typed knowledge base and dependency closure to ensure code executability, successfully validated on battery production optimization and job scheduling tasks where conventional RAG approaches failed.
AIBullisharXiv β CS AI Β· Mar 37/104
π§ Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.