#production-systems News & Analysis

17 articles tagged with #production-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

Holmes: Multimodal Agentic Diagnosis for Mixed-Language Mobile Crashes at Industrial Scale

Holmes is a multi-agent AI system that automates root cause analysis for mobile app crashes in large-scale production environments by synthesizing runtime signals like stack traces and logs without requiring local reproduction. Deployed at WeChat, it achieves 87.6% accuracy in fault localization and reduces debugging time from hours to 77 seconds, demonstrating practical AI applications in enterprise software reliability.

AIBullisharXiv – CS AI · Jun 237/10

🧠

Kamera: Unified Position-Invariant Multimodal KV Cache for Training-Free Reuse

Researchers introduce Kamera, a training-free method that enables efficient reuse of cached key-value pairs in multimodal AI models regardless of position in the context window. By storing small low-rank conditioning patches alongside position-free chunks, the system maintains accuracy for complex multi-hop reasoning tasks while reducing computational overhead—particularly benefiting video and vision-heavy applications.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

Researchers introduced SLARouter, an online algorithm that optimizes LLM request routing by learning cost-efficient policies from sparse user feedback while guaranteeing Service Level Agreement compliance. The approach reduces operating costs by up to 2.2x compared to existing solutions without requiring per-benchmark tuning.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models

Researchers introduce Token Factory, a framework that converts traditional recommendation signals into efficient 'soft tokens' for Large Recommendation Models, enabling better feature integration without excessive computational overhead or prompt bloat. The approach demonstrates practical improvements in production-scale recommendation systems by compressing heterogeneous inputs while maintaining or enhancing model performance.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Catching One in Five: LLM-as-Judge Blind Spots in Production Multi-Turn Transaction Agents

A study of a deployed food-and-beverage ordering chatbot reveals that LLM-based quality judges catch fewer than 25% of genuine defects, missing systematic failures in state-tracking and multi-turn consistency while excelling only at single-turn issues. The research demonstrates that automated evaluation metrics are fundamentally insufficient for production multi-agent systems and should not replace human review.

AIBullisharXiv – CS AI · Jun 57/10

🧠

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Researchers propose PACT, a new protocol for multi-agent AI systems that compresses inter-agent communication into compact action-state records, reducing token usage by up to 50% while maintaining or improving task performance. The approach addresses a critical efficiency bottleneck in large language model-based multi-agent systems, with demonstrated improvements in production coding applications.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference

Researchers introduced PRECISE, a method combining human annotations with LLM judgments to produce statistically reliable ranking evaluation metrics. The approach reduces computational complexity for hierarchical metrics like Precision@K and demonstrated 21% error reduction on benchmarks, with real-world validation showing a +407 basis points sales lift in production systems.

🧠 Claude

AIBullisharXiv – CS AI · Jun 47/10

🧠

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

MapAgent is an AI framework that automates lane-level map generation for autonomous driving at city scale, combining vision-language models with constraint verification to produce specification-compliant maps. Already deployed by Baidu Maps across 360+ Chinese cities, the system achieves over 95% production automation while reducing manual editing overhead in complex scenarios.

AIBullisharXiv – CS AI · Jun 47/10

🧠

SSSD: Simply-Scalable Speculative Decoding

Researchers introduce SSSD, a training-free method for accelerating Large Language Model inference that reduces latency by up to 2.9x through n-gram matching and hardware-aware speculation. The approach matches performance of existing trained methods while eliminating deployment complexity, data preparation, and maintenance overhead.

AINeutralarXiv – CS AI · May 117/10

🧠

A Geometric Taxonomy of Hallucinations in LLMs

Researchers propose a geometric framework for detecting hallucinations in large language models by analyzing embedding space structure, categorizing three types of errors with different detectability profiles. The approach outperforms standard NLI baselines on expert-annotated datasets, providing interpretable diagnostics for production systems operating under black-box constraints.

AIBullisharXiv – CS AI · Apr 107/10

🧠

AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

AgentOpt v0.1, a new Python framework, addresses client-side optimization for AI agents by intelligently allocating models, tools, and API budgets across pipeline stages. Using search algorithms like Arm Elimination and Bayesian Optimization, the tool reduces evaluation costs by 24-67% while achieving near-optimal accuracy, with cost differences between model combinations reaching up to 32x at matched performance levels.

AINeutralarXiv – CS AI · Jun 116/10

🧠

A Five-Plane Reference Architecture for Runtime Governance of Production AI Agents

Researchers propose a five-plane reference architecture for governing production AI agents in enterprise environments, addressing security gaps where traditional data-boundary controls fail. The system uses composite principals, capability attenuation, and structured audit trails to manage delegated agent actions that could otherwise transform business processes without proper authorization.

AIBullisharXiv – CS AI · Jun 96/10

🧠

FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting

Researchers introduce FAME, a sparse mixture-of-experts framework that dynamically routes time series forecasting tasks to specialized models based on data characteristics. Tested on a production retail dataset with 5,000+ vending machines, the system achieves 12.4% MSE improvement over single-model baselines while using only 1.92 experts per series, demonstrating practical advantages for large-scale commercial forecasting systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method

Researchers present a cost-aware method for optimizing speculative execution in LLM-agent workflows, addressing the challenge of reducing idle time while managing per-token billing costs. The approach combines five design decisions—including predictive execution, dual-rate pricing, Bayesian probability estimation, and a configurable latency-cost tradeoff—with safeguards ensuring only side-effect-free operations proceed speculatively.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

Researchers present a hybrid content moderation system for livestreams that combines supervised classification with multimodal similarity matching, achieving 67-76% recall at 80% precision. The production-deployed framework reduces user views of unwanted content by 6-8%, demonstrating scalable AI-driven moderation for user-generated video platforms.

AINeutralarXiv – CS AI · May 16/10

🧠

When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

Researchers present a Bayesian statistical framework for migrating production LLM systems when models reach end-of-life, enabling organizations to confidently compare and select replacement models using limited human evaluation data. The framework was validated on a commercial question-answering system processing 5.3M monthly interactions, addressing a critical operational challenge as the LLM ecosystem rapidly evolves.

AIBullisharXiv – CS AI · Feb 276/105

🧠

Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications

Researchers developed improved neural retriever-reranker pipelines for Retrieval-Augmented Generation (RAG) systems over knowledge graphs in e-commerce applications. The study achieved 20.4% higher Hit@1 and 14.5% higher Mean Reciprocal Rank compared to existing benchmarks, providing a framework for production-ready RAG systems.