Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions
Researchers have identified why layer pruning causes sudden performance collapse in large language models by analyzing decision representation dynamics. The study reveals that pruning disrupts a critical 'Silent Phase' where the model internally processes information before making predictions, while the subsequent 'Decisive Phase' remains robust to pruning.
This research addresses a fundamental challenge in making large language models more efficient. As AI systems grow larger and more computationally expensive, techniques like layer pruning promise to reduce costs without sacrificing performance. However, practitioners frequently observe catastrophic failures when pruning beyond certain thresholds, making the approach unreliable for production deployments.
The study's novel contribution lies in reframing the pruning problem through decision representation rather than traditional activation-based analysis. By introducing Decision Margin and Option Frequency metrics, the researchers mapped how predictions emerge sequentially through network layers. Their discovery of distinct Silent and Decisive phases fundamentally changes how the community should think about network structure. The Silent Phase functions as a critical preprocessing stage where the model builds internal representations necessary for decision-making, while the Decisive Phase merely crystallizes these learned patterns into predictions.
This insight has significant implications for model optimization. It suggests that current pruning strategies fail because they blindly remove layers without understanding their functional role in the decision pipeline. Developers attempting to compress LLMs could waste resources on strategies targeting the wrong architectural components. The research also implies that effective pruning requires identifying and preserving decision-critical pathways rather than removing layers uniformly or based on activation patterns alone.
Future work should explore whether these findings generalize beyond multiple-choice tasks to open-ended generation, and whether targeted pruning of only the Decisive Phase while preserving Silent Phase architecture could yield efficient models. This research opens a pathway toward principled compression strategies that respect the decision dynamics underlying model behavior.
- βLayer pruning causes collapse by disrupting the 'Silent Phase' where internal representations form, not the 'Decisive Phase' that generates predictions.
- βDecision representation analysis reveals sharp transitions in how networks process information, providing new metrics beyond traditional activation-based approaches.
- βPruning the Decisive Phase has minimal performance impact while pruning the Silent Phase triggers immediate collapse, indicating phase-specific sensitivity.
- βCurrent pruning strategies fail because they ignore the functional roles different layers play in the decision pipeline.
- βThese findings suggest future compression techniques should preserve decision-critical pathways rather than uniformly removing layers.