y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#latency-optimization News & Analysis

12 articles tagged with #latency-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AIBullisharXiv – CS AI · 3d ago7/10
🧠

Lodestar: An Online-Learning LLM Inference Router

Researchers introduce Lodestar, a machine learning-based request routing system that dynamically assigns large language model inference tasks to GPU instances in distributed clusters. The system achieves up to 4.38x improvements in latency metrics compared to existing heuristics by continuously learning optimal routing strategies in real-time.

AIBullisharXiv – CS AI · 4d ago7/10
🧠

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Researchers introduce agent just-in-time (JIT) compilation, a system that compiles natural language task descriptions directly into executable code for computer-use agents, achieving 10.4x speedup and 28% higher accuracy compared to existing sequential approaches. The method combines planning, scheduling, and tool protocol innovations to reduce latency and errors in browser automation tasks.

🏢 OpenAI
AIBullisharXiv – CS AI · May 277/10
🧠

MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration

MobileExplorer is a new framework that enables faster on-device inference for mobile GUI agents by leveraging parallel exploration of UI elements during model reasoning time. The system reduces latency by 23% while maintaining or improving task success rates, addressing privacy and network dependency concerns in mobile AI applications.

AIBullisharXiv – CS AI · Mar 67/10
🧠

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Researchers introduce AMV-L, a new memory management framework for long-running LLM systems that uses utility-based lifecycle management instead of traditional time-based retention. The system improves throughput by 3.1x and reduces latency by up to 4.7x while maintaining retrieval quality by controlling memory working-set size rather than just retention time.

AIBullisharXiv – CS AI · Mar 37/105
🧠

Arbor: A Framework for Reliable Navigation of Critical Conversation Flows

Researchers introduce Arbor, a framework that decomposes large language model decision-making into specialized node-level tasks for critical applications like healthcare triage. The system improves accuracy by 29.4 percentage points while reducing latency by 57.1% and costs by 14.4x compared to single-prompt approaches.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools

Researchers identify a privacy vulnerability in AI agents that use speculative tool calls to reduce latency, where external services receive and retain inferred user intent data even after the agent abandons the speculative branch. The study proposes Speculative Tool Privacy Contracts as a runtime solution, finding that only issue-time policies suppressing or modifying calls before dispatch effectively mitigate information leakage.

AINeutralarXiv – CS AI · May 296/10
🧠

Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving

Researchers present a multi-resolution deep neural network for autonomous driving that dynamically selects input resolution based on latency constraints and compute availability. The approach uses per-resolution batch normalization and resolution retargeting to optimize the tradeoff between prediction accuracy and processing speed, demonstrating improved safety metrics in CARLA simulations compared to fixed-resolution models.

AINeutralarXiv – CS AI · May 286/10
🧠

Ocean4Rec: Offline LLM-Derived OCEAN Profiles for Request-Time VOD Reranking

Ocean4Rec presents a novel approach to video-on-demand recommendation by using LLMs offline to generate OCEAN personality profiles for content items, then performing request-time reranking without real-time model calls. The system demonstrates significant NDCG improvements (7.6-61.5%) on Samsung Smart TV data while maintaining deployment simplicity and predictable latency for production services.

$OCEAN
AIBullisharXiv – CS AI · Mar 116/10
🧠

LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems

Researchers present LLM Delegate Protocol (LDP), a new AI-native communication protocol for multi-agent LLM systems that introduces identity awareness, progressive payloads, and governance mechanisms. The protocol achieves 12x lower latency on simple tasks and 37% token reduction compared to existing protocols like A2A, though quality improvements remain limited in small delegate pools.

AIBullisharXiv – CS AI · Mar 96/10
🧠

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

Researchers introduce TempoSyncDiff, a new AI framework that uses distilled diffusion models to generate realistic talking head videos from audio with significantly reduced computational latency. The system addresses key challenges in AI-driven video synthesis including temporal instability, identity drift, and audio-visual alignment while enabling deployment on edge computing devices.