AIBullisharXiv – CS AI · 9h ago7/10
🧠
Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio
Researchers propose CKA-QAD, a new method for quantizing large language models to NVFP4 precision that preserves internal representational geometry rather than just matching output distributions. The approach addresses a critical limitation in existing quantization-aware distillation techniques, showing significant improvements in reasoning and coding task performance across multiple model architectures.