AIBullisharXiv – CS AI · 8h ago7/10
🧠
Do Transformers Need Three Projections? Systematic Study of QKV Variants
Researchers systematically evaluate whether transformer models require three separate QKV projections, discovering that shared projection variants perform comparably while reducing computational overhead. The Q-K=V configuration achieves 50% KV cache reduction with minimal performance loss and combines effectively with existing optimization techniques like MQA to enable practical on-device deployment.
🏢 Perplexity