🧠 AI⚪ NeutralImportance 6/10

Practical Anonymous Two-Party Gradient Boosting Decision Tree

arXiv – CS AI|Huang Chenyu, Zhang Fan, Du Minxin, Chow Sherman SM, Chen Huangxun, Rao Huaming, Huang Danqing, Qian Bo, Chen Peng|May 27, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce an anonymous gradient-boosted decision tree (GBDT) protocol enabling secure training on vertically partitioned data between two parties while hiding record identifiers. The approach uses dual circuit-PSI and oblivious pseudorandom functions to eliminate ID exposure risks inherent in standard private set intersection methods, while achieving computational efficiency comparable to non-private approaches.

Analysis

This research addresses a critical privacy gap in secure machine learning for structured data. Traditional GBDT training using private set intersection (PSI) inadvertently reveals which records exist across both datasets—a significant vulnerability in sensitive domains like finance and healthcare where data confidentiality is paramount. The proposed protocol fundamentally redesigns how two parties can collaboratively train models without exposing these intersection patterns.

The innovation stems from a practical recognition that circuit-PSI, while theoretically sound, carries prohibitive computational overhead for real-world deployments. By introducing dual circuit-PSI where parties alternate roles as receiver and implementing oblivious programmable pseudorandom functions to propagate shared state, the researchers eliminate the need for universal record alignment. This design cleverly sidesteps the traditional scaling problem where ID-hiding costs grew with domain size—a previously unaddressed inefficiency.

The technical contributions extend beyond mere privacy improvements. The team demonstrates a 50% reduction in ciphertext packing costs for homomorphic encryption schemes based on learning with errors, directly improving performance across secure machine-learning applications. Experimental validation shows the protocol remains competitive with privacy-leaking alternatives in execution time, a critical benchmark for adoption.

For practitioners in regulated industries, this work enables previously impractical collaborative analytics without ID exposure risks. Banks, healthcare providers, and research institutions can now train GBDT models on horizontally split datasets with substantially stronger privacy guarantees. The techniques' applicability to other vertically partitioned analytics suggests broader implications for privacy-preserving computation frameworks.

Key Takeaways

→Anonymous GBDT protocol hides record identifiers while maintaining competitive computational efficiency with non-private methods
→Dual circuit-PSI design eliminates ID exposure risks inherent in standard private set intersection approaches
→50% reduction in homomorphic encryption ciphertext packing costs improves performance across secure machine-learning applications
→Techniques scale efficiently with domain size by avoiding universal record alignment requirements
→Protocol extends to broader vertically partitioned analytics beyond GBDT training