AIBullisharXiv โ CS AI ยท 1d ago7/10
๐ง
QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations
Researchers have developed QUARK, a quantization-enabled FPGA acceleration framework that significantly improves Transformer model performance by optimizing nonlinear operations through circuit sharing. The system achieves up to 1.96x speedup over GPU implementations while reducing hardware overhead by more than 50% compared to existing approaches.