🧠 AI🟢 BullishImportance 6/10

DP-OPD: Differentially Private On-Policy Distillation for Language Models

arXiv – CS AI|Fatemeh Khadem, Sajad Mousavi, Yi Fang, Yuhong Liu|April 7, 2026 at 04:00 AM

🤖AI Summary

Researchers have developed DP-OPD (Differentially Private On-Policy Distillation), a new framework for training privacy-preserving language models that significantly improves performance over existing methods. The approach simplifies the training pipeline by eliminating the need for DP teacher training and offline synthetic text generation while maintaining strong privacy guarantees.

Key Takeaways

→DP-OPD improves perplexity performance over traditional DP fine-tuning and off-policy DP distillation methods under strict privacy budgets.
→The framework eliminates the need for DP teacher training and offline synthetic text generation, simplifying the training pipeline.
→The method addresses the tension between formal privacy guarantees and efficient deployment of language models trained on sensitive data.
→DP-OPD uses a frozen teacher to provide dense token-level targets on student-generated trajectories while applying privacy protection only to the student.
→The approach demonstrates superior performance on datasets like Yelp and BigPatent while maintaining strict differential privacy constraints.

Mentioned in AI

Companies

Perplexity→