y0news
#quantization3 articles
3 articles
AIBullisharXiv โ€“ CS AI ยท 6h ago4
๐Ÿง 

BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator

Researchers propose BiKA, a new ultra-lightweight neural network accelerator inspired by Kolmogorov-Arnold Networks that uses binary thresholds instead of complex computations. The FPGA prototype demonstrates 27-51% reduction in hardware resource usage compared to existing binarized and quantized neural network accelerators while maintaining competitive accuracy.

AINeutralarXiv โ€“ CS AI ยท 6h ago4
๐Ÿง 

SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud

Researchers tested distributed AI inference across device, edge, and cloud tiers in a 5G network, finding that sub-second AI response times required for embodied AI are challenging to achieve. On-device execution took multiple seconds, while RAN-edge deployment with quantized models could meet 0.5-second deadlines, and cloud deployment achieved 100% success for 1-second deadlines.

$NEAR
AIBullisharXiv โ€“ CS AI ยท 6h ago9
๐Ÿง 

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

Researchers introduce Quant Experts (QE), a new post-training quantization technique for Vision-Language Models that uses adaptive error compensation with mixture-of-experts architecture. The method addresses computational and memory overhead issues by intelligently handling token-dependent and token-independent channels, maintaining performance comparable to full-precision models across 2B to 70B parameter scales.