PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
PACZero introduces a novel PAC-private fine-tuning mechanism for large language models that achieves usable utility while maintaining zero mutual information leakage, surpassing traditional differential privacy approaches. Using sign quantization of zeroth-order gradients, the method exploits moments of unanimous agreement across candidate subsets to eliminate privacy costs, demonstrating competitive performance on benchmark tasks like SST-2 and SQuAD.
PACZero addresses a fundamental limitation in privacy-preserving machine learning: existing differential privacy frameworks require noise injection that degrades model utility, particularly in high-privacy regimes. This research proposes an alternative privacy accounting mechanism based on PAC (Probably Approximately Correct) privacy that measures information leakage differently from DP, focusing on membership inference attack resistance rather than worst-case perturbation bounds.
The technical innovation centers on sign quantization of aggregated gradients from zeroth-order optimization methods. By examining when all candidate subsets agree on update direction, the mechanism identifies steps where no conditional mutual information is revealed—fundamentally different from DP's approach of adding noise to every computation. This insight enables the creation of two variants: PACZero-MI for budgeted information use and PACZero-ZPL for zero information leakage through random coin flips during disagreement steps.
The empirical results demonstrate meaningful progress in the high-privacy regime where ε<1. On SST-2 with OPT-1.3B models, PACZero-ZPL achieves 88.99% accuracy at I=0, only 2.1 percentage points below non-private baselines. This represents a significant advance since prior methods produced unusable utility at comparable privacy levels. The approach works across both LoRA and full-parameter fine-tuning scenarios, suggesting broader applicability.
For the AI and privacy communities, this work expands the theoretical toolkit for privacy-preserving machine learning by demonstrating that alternative privacy accounting mechanisms can outperform differential privacy in practical scenarios. The method's effectiveness at true zero-information regimes suggests potential applications in sensitive domains requiring absolute privacy guarantees, though scalability to larger models remains an open question.
- →PACZero achieves usable model utility while maintaining zero mutual information leakage, outperforming differential privacy in high-privacy regimes (ε<1)
- →Sign quantization exploits unanimous agreement among candidate subsets to eliminate privacy costs at specific optimization steps
- →Method demonstrates only 2.1pp accuracy drop versus non-private baselines on SST-2 benchmark at maximum privacy level
- →PAC-private accounting based on membership inference resistance offers theoretical advantages over worst-case DP bounds
- →Approach generalizes across model sizes (OPT-1.3B and OPT-6.7B) and training methods (LoRA and full fine-tuning)