🧠 AI🟢 BullishImportance 6/10

APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs

arXiv – CS AI|Mahmoud Srewa, Tianyu Zhao, Salma Elmalaki|April 7, 2026 at 04:00 AM

🤖AI Summary

Researchers propose APPA, a new framework for aligning large language models with diverse human preferences in federated learning environments. The method dynamically reweights group-level rewards to improve fairness, achieving up to 28% better alignment for underperforming groups while maintaining overall model performance.

Key Takeaways

→APPA addresses fairness issues in federated reinforcement learning from human feedback by dynamically reweighting group rewards based on historical performance.
→The framework improves worst-performing group alignment by up to 28% compared to average aggregation methods.
→Testing across three model families (Gemma 2 2B, Llama 3.2 3B, Qwen3 0.6B) demonstrates consistent improvements in fairness-alignment trade-offs.
→The approach operates without requiring access to raw preference data, making it suitable for privacy-preserving federated learning scenarios.
→APPA outperforms both average-based and min-based aggregation methods in balancing overall alignment with fairness across diverse user groups.

Mentioned in AI

Companies

Meta→

Models

LlamaMeta