AIBullisharXiv – CS AI · 5h ago6/10
🧠
RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation
RaguTeam won SemEval-2026 Task 8 using a seven-model LLM ensemble with a GPT-4o-mini judge selector, achieving a conditioned harmonic mean of 0.7827 and significantly outperforming the baseline. The research demonstrates that model diversity across families, scales, and prompting strategies drives superior performance in multi-turn response generation tasks.
🧠 GPT-4