#llm News & Analysis

956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

956 articles

AIBullisharXiv – CS AI · Mar 36/104

🧠

Post-training Large Language Models for Diverse High-Quality Responses

Researchers have developed DQO (Diversity Quality Optimization), a new training method that uses determinantal point processes to improve large language models' response diversity while maintaining quality. The approach addresses a key limitation of current reinforcement learning methods that tend to narrow LLM outputs to canonical responses.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Researchers propose rubric-based reward modeling to address reward over-optimization in large language model fine-tuning. The approach focuses on the high-reward tail where models struggle to distinguish excellent responses from merely great ones, using off-policy examples to improve training effectiveness.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches

Researchers present a comprehensive analysis of post-training N:M activation pruning techniques for large language models, demonstrating that activation pruning preserves generative capabilities better than weight pruning. The study establishes hardware-friendly baselines and explores sparsity patterns beyond NVIDIA's standard 2:4, with 8:16 patterns showing superior performance while maintaining implementation feasibility.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning

Researchers propose Quantile Advantage Estimation (QAE) to stabilize Reinforcement Learning with Verifiable Rewards (RLVR) for large language model reasoning. The method replaces mean baselines with group-wise K-quantile baselines to prevent entropy collapse and explosion, showing sustained improvements on mathematical reasoning tasks.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

Researchers introduce ReMemR1, a new approach to improve large language models' ability to handle long-context question answering by integrating memory retrieval into the memory update process. The system enables non-linear reasoning through selective callback of historical memories and uses multi-level reward design to strengthen training.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Prompt and Parameter Co-Optimization for Large Language Models

Researchers introduce MetaTuner, a new framework that combines prompt optimization with fine-tuning for Large Language Models, using shared neural networks to discover optimal combinations of prompts and parameters. The approach addresses the discrete-continuous optimization challenge through supervised regularization and demonstrates consistent performance improvements across benchmarks.

AINeutralarXiv – CS AI · Mar 35/104

🧠

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Researchers introduced SimuHome, a high-fidelity smart home simulator and benchmark with 600 episodes for testing LLM-based smart home agents. The system uses the Matter protocol standard and enables time-accelerated simulation to evaluate how AI agents handle device control, environmental monitoring, and workflow scheduling in smart homes.

AIBullisharXiv – CS AI · Mar 36/104

🧠

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

Researchers have developed EasySteer, a unified framework for controlling large language model behavior at inference time that achieves 10.8-22.3x speedup over existing frameworks. The system offers modular architecture with pre-computed steering vectors for eight application domains and transforms steering from a research technique into production-ready capability.

AIBullisharXiv – CS AI · Mar 36/104

🧠

TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA

TiTok is a new framework for transferring LoRA (Low-Rank Adaptation) parameters between different Large Language Model backbones without requiring additional training data or discriminator models. The method uses token-level contrastive learning to achieve 4-10% performance gains over existing approaches in parameter-efficient fine-tuning scenarios.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Training Large Language Models To Reason In Parallel With Global Forking Tokens

Researchers developed Set Supervised Fine-Tuning (SSFT) and Global Forking Policy Optimization (GFPO) methods to improve large language model reasoning by enabling parallel processing through 'global forking tokens.' The techniques preserve diverse reasoning modes and demonstrate superior performance on math and code generation benchmarks compared to traditional fine-tuning approaches.

AINeutralarXiv – CS AI · Mar 36/103

🧠

OBsmith: LLM-Powered JavaScript Obfuscator Testing

Researchers introduce OBsmith, an LLM-powered framework that tests JavaScript obfuscators for correctness bugs that can silently alter program functionality. The tool discovered 11 previously unknown bugs that existing JavaScript fuzzers failed to detect, highlighting critical gaps in obfuscation quality assurance.

AIBullisharXiv – CS AI · Mar 36/103

🧠

LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation

Researchers developed LSPRAG, a new framework that uses Language Server Protocol backends to help Large Language Models generate unit tests across multiple programming languages in real-time. The system achieved significant improvements in test coverage, with increases up to 213% for Java, 174% for Go, and 31% for Python compared to existing methods.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Reliable Fine-Grained Evaluation of Natural Language Math Proofs

Researchers have developed ProofGrader, a new AI system that can reliably evaluate natural language mathematical proofs generated by large language models on a fine-grained 0-7 scale. The system was trained using ProofBench, the first expert-annotated dataset of proof ratings covering 145 competition math problems and 435 LLM solutions, achieving significant improvements over basic evaluation methods.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Researchers introduce SupervisorAgent, a lightweight framework that reduces token consumption in Multi-Agent Systems by 29.68% while maintaining performance. The system provides real-time supervision and error correction without modifying base agent architectures, validated across multiple AI benchmarks.

AIBearisharXiv – CS AI · Mar 36/104

🧠

Are LLMs Ready to Replace Bangla Annotators?

A comprehensive study of 17 Large Language Models as automated annotators for Bangla hate speech detection reveals significant bias and instability issues. The research found that larger models don't necessarily perform better than smaller, task-specific ones, raising concerns about LLM reliability for sensitive annotation tasks in low-resource languages.

AIBullisharXiv – CS AI · Mar 26/1015

🧠

FineScope : SAE-guided Data Selection Enables Domain Specific LLM Pruning and Finetuning

Researchers introduce FineScope, a framework that uses Sparse Autoencoder (SAE) techniques to create smaller, domain-specific language models from larger pretrained LLMs through structured pruning and self-data distillation. The method achieves competitive performance while significantly reducing computational requirements compared to training from scratch.

AIBullisharXiv – CS AI · Mar 27/1012

🧠

FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data

Researchers have developed FinBloom 7B, a specialized large language model trained on 14 million financial news articles and SEC filings, designed to handle real-time financial queries. The model introduces a Financial Agent system that can access up-to-date market data and financial information to support decision-making and algorithmic trading applications.

AIBullisharXiv – CS AI · Mar 26/1016

🧠

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Researchers introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that improves AI reasoning efficiency by helping large reasoning models know when to stop thinking. The approach addresses the problem of redundant, lengthy reasoning chains that don't improve accuracy while reducing computational costs and response times.

AIBullisharXiv – CS AI · Mar 26/1017

🧠

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Researchers introduce MITS (Mutual Information Tree Search), a new framework that improves reasoning capabilities in large language models using information-theoretic principles. The method uses pointwise mutual information for step-wise evaluation and achieves better performance while being more computationally efficient than existing tree search methods like Tree-of-Thought.

AIBullisharXiv – CS AI · Mar 27/1025

🧠

Capabilities Ain't All You Need: Measuring Propensities in AI

Researchers introduce the first formal framework for measuring AI propensities - the tendencies of models to exhibit particular behaviors - going beyond traditional capability measurements. The new bilogistic approach successfully predicts AI behavior on held-out tasks and shows stronger predictive power when combining propensities with capabilities than using either measure alone.

AIBullisharXiv – CS AI · Mar 27/1015

🧠

Real-Time Aligned Reward Model beyond Semantics

Researchers introduce R2M (Real-Time Aligned Reward Model), a new framework for Reinforcement Learning from Human Feedback (RLHF) that addresses reward overoptimization in large language models. The system uses real-time policy feedback to better align reward models with evolving policy distributions during training.

AIBullisharXiv – CS AI · Mar 27/1020

🧠

Training Generalizable Collaborative Agents via Strategic Risk Aversion

Researchers developed a new multi-agent reinforcement learning algorithm that uses strategic risk aversion to create AI agents that can reliably collaborate with unseen partners. The approach addresses the problem of brittle AI collaboration systems that fail when working with new partners by incorporating robustness against behavioral deviations.

AIBullisharXiv – CS AI · Mar 26/1021

🧠

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Researchers developed Agentic Predictor, a lightweight AI system that uses multi-view encoding to optimize LLM-based agent workflows without expensive trial-and-error evaluations. The system incorporates code architecture, textual prompts, and interaction graphs to predict task success rates and select optimal configurations across different domains.

AIBullisharXiv – CS AI · Mar 27/1017

🧠

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Researchers introduce CoMind, a multi-agent AI system that leverages community knowledge to automate machine learning engineering tasks. The system achieved a 36% medal rate on 75 past Kaggle competitions and outperformed 92.6% of human competitors in eight live competitions, establishing new state-of-the-art performance.

AIBullisharXiv – CS AI · Mar 26/1017

🧠

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Researchers developed a data-driven pipeline to optimize GPU efficiency for distributed LLM adapter serving, achieving sub-5% throughput estimation error while running 90x faster than full benchmarking. The system uses a Digital Twin, machine learning models, and greedy placement algorithms to minimize GPU requirements while serving hundreds of adapters concurrently.

← PrevPage 27 of 39Next →