944 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv β CS AI Β· Mar 47/103
π§ Researchers introduce Paramβ, a novel method for transferring post-training capabilities to updated language models without additional training costs. The technique achieves 95% performance of traditional post-training by computing weight differences between base and post-trained models, offering significant cost savings for AI model development.
AIBullisharXiv β CS AI Β· Mar 46/103
π§ Researchers introduce TΒ³, a new method to improve large language model (LLM) agents' reasoning abilities by tracking and correcting 'belief deviation' - when AI agents lose accurate understanding of problem states. The technique achieved up to 30-point performance gains and 34% token cost reduction across challenging tasks.
$COMP
AINeutralarXiv β CS AI Β· Mar 46/102
π§ Researchers propose PURE, a new framework for AI-powered recommendation systems that addresses preference-inconsistent explanations - where AI provides factually correct but unconvincing reasoning that conflicts with user preferences. The system uses a select-then-generate approach to improve both evidence selection and explanation generation, demonstrating reduced hallucinations while maintaining recommendation accuracy.
AINeutralarXiv β CS AI Β· Mar 46/105
π§ Researchers propose a framework for developing trustworthy AI agents that function as epistemic entities, capable of pursuing knowledge goals and shaping information environments. The paper argues that as AI models increasingly replace traditional search methods and provide specialized advice, their calibration to human epistemic norms becomes critical to prevent cognitive deskilling and epistemic drift.
AIBullisharXiv β CS AI Β· Mar 46/102
π§ Researchers developed SAE-based Transferability Score (STS), a new metric using sparse autoencoders to predict how well fine-tuned large language models will perform across different domains without requiring actual training. The method achieves correlation coefficients above 0.7 with actual performance changes and provides interpretable insights into model adaptation.
AIBullisharXiv β CS AI Β· Mar 46/102
π§ Researchers propose NAR-CP, a new method to improve Large Language Models' performance in high-frequency decision-making tasks like UAV pursuit. The approach uses normalized action rewards and consistency policy optimization to address limitations in current LLM-based agents that struggle with rapid, precise numerical state updates.
AINeutralarXiv β CS AI Β· Mar 46/104
π§ Researchers analyzed memory systems in LLM agents and found that retrieval methods are more critical than write strategies for performance. Simple raw chunk storage matched expensive alternatives, suggesting current memory pipelines may discard useful context that retrieval systems cannot compensate for.
AIBearisharXiv β CS AI Β· Mar 47/103
π§ Research reveals that AI agents experience 'echoing' failures when communicating with each other, where they abandon their assigned roles and mirror their conversation partners instead. The study found echoing rates as high as 70% across major LLM providers, with the phenomenon persisting even in advanced reasoning models and occurring more frequently in longer conversations.
AIBullisharXiv β CS AI Β· Mar 46/103
π§ Researchers have developed an agentic AI-driven workflow using Large Language Models to automate coverage analysis for formal verification in integrated chip development. The approach systematically identifies coverage gaps and generates required formal properties, demonstrating measurable improvements in coverage metrics that correlate with design complexity.
AIBullisharXiv β CS AI Β· Mar 46/102
π§ Researchers introduce BehaveSim, a new method to measure algorithmic similarity by analyzing problem-solving behavior rather than code syntax. The approach enhances AI-driven algorithm design frameworks and enables systematic analysis of AI-generated algorithms through behavioral clustering.
AIBearishArs Technica β AI Β· Mar 37/102
π§ Research demonstrates that Large Language Models (LLMs) can identify pseudonymous users with surprising accuracy when analyzing their online activity patterns at scale. This development poses significant threats to privacy protections that pseudonymity previously provided across digital platforms.
AIBullisharXiv β CS AI Β· Mar 37/103
π§ Researchers developed LA-CDM, a language agent that uses reinforcement learning to support clinical decision-making by iteratively requesting tests and generating hypotheses for diagnosis. The system was trained using a hybrid approach combining supervised and reinforcement learning, and tested on real-world data covering four abdominal diseases.
AINeutralarXiv β CS AI Β· Mar 37/104
π§ Researchers analyzed 20 Mixture-of-Experts (MoE) language models to study local routing consistency, finding a trade-off between routing consistency and local load balance. The study introduces new metrics to measure how well expert offloading strategies can optimize memory usage on resource-constrained devices while maintaining inference speed.
AIBullisharXiv β CS AI Β· Mar 37/104
π§ Researchers released two open-source datasets, SwallowCode and SwallowMath, that significantly improve large language model performance in coding and mathematics through systematic data rewriting rather than filtering. The datasets boost Llama-3.1-8B performance by +17.0 on HumanEval for coding and +12.4 on GSM8K for math tasks.
AIBullisharXiv β CS AI Β· Mar 37/102
π§ Researchers propose Partial Model Collapse (PMC), a novel machine unlearning method for large language models that removes private information without directly training on sensitive data. The approach leverages model collapse - where models degrade when trained on their own outputs - as a feature to deliberately forget targeted information while preserving general utility.
AIBullisharXiv β CS AI Β· Mar 37/103
π§ Researchers introduce RoboPARA, a new LLM-driven framework that optimizes dual-arm robot task planning through parallel processing and dependency mapping. The system uses directed acyclic graphs to maximize efficiency in complex multitasking scenarios and includes the first dataset specifically designed for evaluating dual-arm parallelism.
AINeutralarXiv β CS AI Β· Mar 37/104
π§ New research formally defines and analyzes pattern matching in large language models, revealing predictable limits in their ability to generalize on compositional tasks. The study provides mathematical boundaries for when pattern matching succeeds or fails, with implications for AI model development and understanding.
AIBullisharXiv β CS AI Β· Mar 37/103
π§ Researchers propose GenDB, a revolutionary database system that uses Large Language Models to synthesize query execution code instead of relying on traditional engineered query processors. Early prototype testing shows GenDB outperforms established systems like DuckDB, Umbra, and PostgreSQL on OLAP workloads.
AIBullisharXiv β CS AI Β· Mar 37/103
π§ Meta presents CharacterFlywheel, an iterative process for improving large language models in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, the system achieved significant improvements through 15 generations of refinement, with the best models showing up to 8.8% improvement in engagement breadth and 19.4% in engagement depth while substantially improving instruction following capabilities.
AIBullisharXiv β CS AI Β· Mar 37/103
π§ Researchers have developed FROGENT, an AI multi-agent system that uses large language models to automate the entire drug discovery pipeline from target identification to synthesis planning. The system outperformed existing AI approaches across eight benchmarks and demonstrated practical applications in real-world drug design scenarios.
AIBullisharXiv β CS AI Β· Mar 37/104
π§ Researchers propose ROMA, a new hardware accelerator for running large language models on edge devices using QLoRA. The system uses ROM storage for quantized base models and SRAM for LoRA weights, achieving over 20,000 tokens/s generation speed without external memory.
AIBullisharXiv β CS AI Β· Mar 37/102
π§ Researchers propose GradientStabilizer, a new technique to address training instability in deep learning by replacing gradient magnitude with statistically stabilized estimates while preserving direction. The method outperforms gradient clipping across multiple AI training scenarios including LLM pre-training, reinforcement learning, and computer vision tasks.
AIBullisharXiv β CS AI Β· Mar 37/103
π§ Researchers developed ZeroDVFS, a system that uses Large Language Models to optimize power management in embedded systems without requiring extensive profiling. The system achieves 7.09 times better energy efficiency and enables zero-shot deployment for new workloads in under 5 seconds through LLM-based code analysis.
AIBullisharXiv β CS AI Β· Mar 37/104
π§ Researchers introduce DRAGON, a new framework that combines Large Language Models with metaheuristic optimization to solve large-scale combinatorial optimization problems. The system decomposes complex problems into manageable subproblems and achieves near-optimal results on datasets with over 3 million variables, overcoming the scalability limitations of existing LLM-based solvers.
$NEAR
AIBullisharXiv β CS AI Β· Mar 37/104
π§ Researchers introduce SVDecode, a new method for adapting large language models to specific tasks without extensive fine-tuning. The technique uses steering vectors during decoding to align output distributions with task requirements, improving accuracy by up to 5 percentage points while adding minimal computational overhead.