y0news
#llm56 articles
56 articles
AIBullisharXiv – CS AI · 6h ago8
🧠

Real-Time Aligned Reward Model beyond Semantics

Researchers introduce R2M (Real-Time Aligned Reward Model), a new framework for Reinforcement Learning from Human Feedback (RLHF) that addresses reward overoptimization in large language models. The system uses real-time policy feedback to better align reward models with evolving policy distributions during training.

AIBullisharXiv – CS AI · 6h ago6
🧠

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Researchers introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that improves AI reasoning efficiency by helping large reasoning models know when to stop thinking. The approach addresses the problem of redundant, lengthy reasoning chains that don't improve accuracy while reducing computational costs and response times.

AIBullisharXiv – CS AI · 6h ago4
🧠

FinBloom: Knowledge Grounding Large Language Model with Real-time Financial Data

Researchers have developed FinBloom 7B, a specialized large language model trained on 14 million financial news articles and SEC filings, designed to handle real-time financial queries. The model introduces a Financial Agent system that can access up-to-date market data and financial information to support decision-making and algorithmic trading applications.

AIBullisharXiv – CS AI · 6h ago5
🧠

FineScope : SAE-guided Data Selection Enables Domain Specific LLM Pruning and Finetuning

Researchers introduce FineScope, a framework that uses Sparse Autoencoder (SAE) techniques to create smaller, domain-specific language models from larger pretrained LLMs through structured pruning and self-data distillation. The method achieves competitive performance while significantly reducing computational requirements compared to training from scratch.

AIBullisharXiv – CS AI · 6h ago10
🧠

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Researchers developed Agentic Predictor, a lightweight AI system that uses multi-view encoding to optimize LLM-based agent workflows without expensive trial-and-error evaluations. The system incorporates code architecture, textual prompts, and interaction graphs to predict task success rates and select optimal configurations across different domains.

AIBullisharXiv – CS AI · 6h ago10
🧠

Beyond Na\"ive Prompting: Strategies for Improved Context-aided Forecasting with LLMs

Researchers introduce a framework of four strategies to improve large language models' performance in context-aided forecasting, addressing diagnostic tools, accuracy, and efficiency. The study reveals an 'Execution Gap' where models understand context but fail to apply reasoning, while showing 25-50% performance improvements and cost-effective adaptive routing approaches.

AINeutralarXiv – CS AI · 6h ago9
🧠

LumiMAS: A Comprehensive Framework for Real-Time Monitoring and Enhanced Observability in Multi-Agent Systems

Researchers have developed LumiMAS, a comprehensive framework for monitoring and detecting failures in multi-agent systems that incorporate large language models. The framework features three layers: monitoring and logging, anomaly detection, and anomaly explanation with root cause analysis, addressing the unique challenges of observing entire multi-agent systems rather than individual agents.

AIBullisharXiv – CS AI · 6h ago7
🧠

Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning

Researchers introduce Latent Self-Consistency (LSC), a new method for improving Large Language Model output reliability across both short and long-form reasoning tasks. LSC uses learnable token embeddings to select semantically consistent responses with only 0.9% computational overhead, outperforming existing consistency methods like Self-Consistency and Universal Self-Consistency.

AIBullisharXiv – CS AI · 6h ago8
🧠

Thompson Sampling via Fine-Tuning of LLMs

Researchers developed ToSFiT (Thompson Sampling via Fine-Tuning), a new Bayesian optimization method that uses fine-tuned large language models to improve search efficiency in complex discrete spaces. The approach eliminates computational bottlenecks by directly parameterizing reward probabilities and demonstrates superior performance across diverse applications including protein search and quantum circuit design.

AIBullisharXiv – CS AI · 6h ago7
🧠

VCWorld: A Biological World Model for Virtual Cell Simulation

Researchers have developed VCWorld, a new AI-powered biological simulation system that combines large language models with structured biological knowledge to predict cellular responses to drug perturbations. The system operates as a 'white-box' model, providing interpretable predictions and mechanistic insights while achieving state-of-the-art performance in drug perturbation benchmarks.

AIBullisharXiv – CS AI · 6h ago5
🧠

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Researchers propose Trust Region Masking (TRM) to address off-policy mismatch problems in Large Language Model reinforcement learning pipelines. The method provides the first non-vacuous monotonic improvement guarantees for long-horizon LLM-RL tasks by masking entire sequences that violate trust region constraints.

AIBullisharXiv – CS AI · 6h ago9
🧠

LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment

Researchers developed LIA, a supervised fine-tuning approach using DeepSeek-R1-Distill-Llama-8B to automatically assign software issues to developers. The system achieved up to 187.8% improvement over the base model and 211.2% better performance than existing methods in developer recommendation accuracy.

AIBullisharXiv – CS AI · 6h ago7
🧠

VISTA: Knowledge-Driven Vessel Trajectory Imputation with Repair Provenance

Researchers introduce VISTA, a framework for vessel trajectory imputation that uses knowledge-driven LLM reasoning to repair incomplete maritime tracking data. The system provides 'repair provenance' - documented reasoning behind data repairs - achieving 5-91% accuracy improvements over existing methods while reducing inference time by 51-93%.

AIBullisharXiv – CS AI · 6h ago11
🧠

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

Researchers propose DUET, a new distillation-based method for LLM unlearning that removes undesirable knowledge from AI models without full retraining. The technique combines computational efficiency with security advantages, achieving better performance in both knowledge removal and utility preservation while being significantly more data-efficient than existing methods.

AINeutralarXiv – CS AI · 6h ago9
🧠

An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents

Researchers analyzed 7 million posts from 32,000 AI agents on Chirper.ai over one year, finding that LLM agents exhibit social behaviors similar to humans including homophily and social influence. The study revealed distinct patterns in toxic language among AI agents and proposed a 'Chain of Social Thought' method to reduce harmful posting behaviors.

AINeutralarXiv – CS AI · 6h ago4
🧠

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Researchers have developed an automated pipeline to detect hidden biases in Large Language Models that don't appear in their reasoning explanations. The system discovered previously unknown biases like Spanish fluency and writing formality across seven LLMs in hiring, loan approval, and university admission tasks.

AIBullisharXiv – CS AI · 6h ago7
🧠

Capabilities Ain't All You Need: Measuring Propensities in AI

Researchers introduce the first formal framework for measuring AI propensities - the tendencies of models to exhibit particular behaviors - going beyond traditional capability measurements. The new bilogistic approach successfully predicts AI behavior on held-out tasks and shows stronger predictive power when combining propensities with capabilities than using either measure alone.

AIBullisharXiv – CS AI · 6h ago10
🧠

Training Generalizable Collaborative Agents via Strategic Risk Aversion

Researchers developed a new multi-agent reinforcement learning algorithm that uses strategic risk aversion to create AI agents that can reliably collaborate with unseen partners. The approach addresses the problem of brittle AI collaboration systems that fail when working with new partners by incorporating robustness against behavioral deviations.

AIBullisharXiv – CS AI · 6h ago1
🧠

ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation

Researchers developed ProductResearch, a multi-agent AI framework that creates synthetic training data to improve e-commerce shopping agents. The system uses multiple AI agents to generate comprehensive product research trajectories, with experiments showing a compact model fine-tuned on this synthetic data significantly outperforming base models in shopping assistance tasks.

AINeutralarXiv – CS AI · 6h ago1
🧠

HotelQuEST: Balancing Quality and Efficiency in Agentic Search

Researchers introduce HotelQuEST, a new benchmark for evaluating agentic search systems that balances quality and efficiency metrics. The study reveals that while LLM-based agents achieve higher accuracy than traditional retrievers, they incur substantially higher costs due to redundant operations and poor optimization.

AINeutralarXiv – CS AI · 6h ago1
🧠

Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek

A study evaluated large language models (Claude, Gemini, ChatGPT) translating Ancient Greek texts, finding high performance on previously translated works (95.2/100) but declining quality on untranslated technical texts (79.9/100). Terminology rarity was identified as a strong predictor of translation failure, with rare terms causing catastrophic performance drops.

AINeutralarXiv – CS AI · 6h ago1
🧠

Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

A research position paper examines the integration of Large Language Models (LLMs) in agent-based social simulations, highlighting both opportunities and limitations. The study proposes Hybrid Constitutional Architectures that combine classical agent-based models with small language models and LLMs to balance expressive flexibility with analytical transparency.

AINeutralarXiv – CS AI · 6h ago1
🧠

From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?

Researchers explore using large language models (LLMs) as mediators rather than just moderators in online conflicts, developing a framework that combines judgment evaluation and empathetic intervention. Their study using Reddit data shows API-based models outperform open-source alternatives in de-escalating flame wars and fostering constructive dialogue.

AINeutralarXiv – CS AI · 6h ago2
🧠

User Misconceptions of LLM-Based Conversational Programming Assistants

Researchers analyzed user misconceptions about LLM-based programming assistants like ChatGPT, finding users often have misplaced expectations about web access, code execution, and debugging capabilities. The study examined Python programming conversations from WildChat dataset and identified the need for clearer communication of tool capabilities to prevent over-reliance and unproductive practices.

← PrevPage 2 of 3Next →