AIBullisharXiv – CS AI · 3h ago7/10
🧠
Locality-Aware Redundancy Pruning for LLM Depth Compression
Researchers propose Locality-Aware Redundancy Pruning (LoRP), a training-free method for compressing large language models by removing redundant layers based on representational similarity patterns. The framework uses a Representation Locality Score to identify and prune depth-wise redundancy more effectively than existing approaches, improving both perplexity and downstream task performance across multiple LLM architectures.
🏢 Perplexity