GuidaPA: Privacy-Preserving Chatbot for Public Administration via Federated Learning
GuidaPA is a privacy-preserving chatbot for Italian public administration that uses federated learning to train on sensitive documentation without centralizing data. The system achieves comparable performance to traditional centralized fine-tuning while keeping sensitive data distributed across agency servers, demonstrating federated learning's viability for regulated institutional deployments.
GuidaPA addresses a critical infrastructure challenge facing modern governments: how to deploy advanced AI systems on sensitive internal data without violating privacy regulations or consolidating information in vulnerable central repositories. The Italian Public Administration's need to train chatbots on restricted materials like officer manuals, support tickets, and database extracts reflects a broader trend where institutional organizations cannot leverage centralized machine learning pipelines due to GDPR, regulatory constraints, and organizational silos.
Federated learning has remained largely theoretical in enterprise settings despite its conceptual appeal. This implementation validates that parameter-efficient fine-tuning approaches like QLoRA can deliver production-quality results across distributed clients. The performance metrics—ROUGE-1 of 61.10 versus 41.45 for the baseline and BLEU-4 of 45.02 versus 26.97—demonstrate that domain-specific adaptation through federated channels produces meaningfully superior conversational AI compared to general-purpose models.
The technical architecture's inclusion of role-based access control and explicit monitoring of non-IID (non-independent, identically distributed) effects indicates maturation beyond proof-of-concept. Non-IID data distribution across decentralized clients typically degrades federated model performance; successfully addressing this challenge increases viability for real-world deployments where data heterogeneity is inevitable.
For government and enterprise AI adoption, this work removes a significant implementation barrier. Organizations can now deploy sophisticated language models on sensitive data without choosing between capability and compliance. The methodology scales to any public administration globally facing similar constraints, potentially accelerating institutional AI adoption across sectors where centralized data pooling remains legally or operationally infeasible.
- →Federated learning delivers near-centralized performance (ROUGE-1: 61.10) while keeping sensitive government data distributed across agency servers.
- →Domain-specific fine-tuning improves chatbot quality by 50% over baseline models through federated parameter-efficient adaptation.
- →Role-based access control and non-IID monitoring enable production deployment for regulated institutional use cases.
- →This approach removes the privacy-capability tradeoff, enabling sophisticated AI deployment without centralized data consolidation.
- →The methodology generalizes to any public administration or regulated organization constrained by data residency and privacy regulations.