lmfaoooo at SemEval-2026 Task 1: Humor Is an Audience. Preference Modeling for Constrained Humor Generation
A research team won first place in the SemEval-2026 Task-1 humor generation competition by developing a system that generates diverse joke candidates and selects the best ones using a preference model trained on human comparisons. The approach addresses the core challenge that humor is subjective and audience-dependent, rather than objectively measurable, achieving top rankings across English, Chinese, and Spanish subtasks.
This research tackles a fundamental challenge in AI-generated content: modeling subjective human preferences rather than optimizing for absolute quality metrics. The team's two-stage approach—generating diverse candidates followed by preference-based selection—reflects a practical shift in how AI systems can handle tasks where ground truth is inherently ambiguous. This methodology extends beyond humor and applies to any domain where quality depends on individual taste, cultural context, and audience expectations.
The release of 2.5K pairwise human judgments and the interpretable pipeline for converting comparisons into preference models represent valuable contributions to the AI research community. Rather than collecting traditional labels, the researchers harvested preference data through arena-style comparisons, which mirrors successful approaches in large language model alignment and reinforcement learning from human feedback (RLHF). This aligns with broader industry trends where preference modeling has become central to AI safety and output quality control.
The cross-domain transfer capabilities demonstrated across three language variants suggest that preference models can generalize effectively, a finding with implications for multilingual AI systems. For developers building content generation systems, this work validates that preference models trained on limited comparison data outperform baselines and transfer across different contexts. The competitive success—first place in two subtasks—demonstrates the practical viability of this approach at scale. Organizations developing AI systems for subjective domains should consider preference-based ranking as a viable alternative to absolute scoring, particularly when human evaluation budgets are constrained.
- →Preference modeling based on pairwise human comparisons outperforms absolute quality scoring for subjective tasks like humor generation.
- →Generating diverse candidate pools and selecting via preference models achieved first-place rankings in multilingual humor generation competition.
- →The released 2.5K human pairwise judgments and interpretable preference pipeline enable reproducible research in audience-dependent content generation.
- →Preference models demonstrate strong cross-domain transfer capabilities across English, Chinese, and Spanish language variants.
- →Arena-style comparison-based feedback collection offers a scalable alternative to traditional annotation for subjective quality assessment.