9 articles tagged with #production-deployment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง Researchers developed Pichay, a demand paging system that treats LLM context windows like computer memory with hierarchical caching. The system reduces context consumption by up to 93% in production by evicting stale content and managing memory more efficiently, addressing fundamental scalability issues in AI systems.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed HAP (Heterogeneity-Aware Adaptive Pre-ranking), a new framework for recommender systems that addresses gradient conflicts in training by separating easy and hard samples. The system has been deployed in Toutiao's production environment for 9 months, achieving 0.4% improvement in user engagement without additional computational costs.
AIBullisharXiv โ CS AI ยท Mar 47/103
๐ง Researchers present Odin, the first production-deployed graph intelligence engine that autonomously discovers patterns in knowledge graphs without predefined queries. The system uses a novel COMPASS scoring metric combining structural, semantic, temporal, and community-aware signals, and has been successfully deployed in regulated healthcare and insurance environments.
AINeutralarXiv โ CS AI ยท 5d ago6/10
๐ง LLM-HYPER is a new framework that uses large language models as hypernetworks to generate click-through rate prediction models for cold-start ads without traditional training. The system achieved a 55.9% improvement over baseline methods in offline tests and has been successfully deployed in production on a major U.S. e-commerce platform.
AINeutralarXiv โ CS AI ยท Mar 176/10
๐ง Researchers identify three critical gaps in the Model Context Protocol (MCP) that prevent AI agents from operating safely at production scale, despite MCP having over 10,000 active servers and 97 million monthly SDK downloads. The paper proposes three new mechanisms to address missing identity propagation, adaptive tool budgeting, and structured error semantics based on enterprise deployment experience.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers introduce Test-Driven AI Agent Definition (TDAD), a methodology that compiles AI agent prompts from behavioral specifications using automated testing. The approach addresses production deployment challenges by ensuring measurable behavioral compliance and preventing silent regressions in tool-using LLM agents.
AIBullisharXiv โ CS AI ยท Mar 37/107
๐ง Researchers developed ToolRLA, a three-stage reinforcement learning pipeline that significantly improves AI agents' ability to use external tools and APIs for domain-specific tasks. The system achieved 47% higher task completion rates and 93% lower regulatory violations when deployed in a real-world financial advisory copilot serving 80+ advisors with 1,200+ daily queries.
AINeutralarXiv โ CS AI ยท Mar 36/103
๐ง Research on production RAG systems reveals that retrieval fusion techniques like multi-query retrieval and reciprocal rank fusion increase raw document recall but fail to improve end-to-end performance due to re-ranking limits and context constraints. The study found fusion variants actually decreased accuracy from 0.51 to 0.48 while adding latency overhead without corresponding benefits.
AIBullishOpenAI News ยท Oct 66/106
๐ง OpenAI has released new developer tools including AgentKit, expanded evaluation capabilities, and reinforcement fine-tuning specifically designed for AI agents. These tools aim to accelerate the development process from prototype to production deployment for AI agent applications.