y0news
#ai-optimization3 articles
3 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago5
๐Ÿง 

ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference

Researchers propose ODAR-Expert, an adaptive routing framework for large language models that optimizes accuracy-efficiency trade-offs by dynamically routing queries between fast and slow processing agents. The system achieved 98.2% accuracy on MATH benchmarks while reducing computational costs by 82%, suggesting that optimal AI scaling requires adaptive resource allocation rather than simply increasing test-time compute.

AIBullisharXiv โ€“ CS AI ยท 6h ago8
๐Ÿง 

Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

Researchers have developed Higress-RAG, a new enterprise-grade framework that addresses key challenges in Retrieval-Augmented Generation systems including low retrieval precision, hallucination, and high latency. The system introduces innovations like 50ms semantic caching, hybrid retrieval methods, and corrective evaluation to optimize the entire RAG pipeline for production use.

$LINK
AIBullisharXiv โ€“ CS AI ยท 6h ago9
๐Ÿง 

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

Researchers introduce Quant Experts (QE), a new post-training quantization technique for Vision-Language Models that uses adaptive error compensation with mixture-of-experts architecture. The method addresses computational and memory overhead issues by intelligently handling token-dependent and token-independent channels, maintaining performance comparable to full-precision models across 2B to 70B parameter scales.