AIBullisharXiv โ CS AI ยท 14h ago7/10
๐ง
Why Smaller Is Slower? Dimensional Misalignment in Compressed LLMs
Researchers identify dimensional misalignment as a critical bottleneck in compressed large language models, where parameter reduction fails to improve GPU performance due to hardware-incompatible tensor dimensions. They propose GAC (GPU-Aligned Compression), a new optimization method that achieves up to 1.5ร speedup while maintaining model quality by ensuring hardware-friendly dimensions.
๐ง Llama