DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios
Researchers introduce DEFINED, a computational framework for assessing creativity in debate using a hierarchical eight-dimensional metric system. The approach combines pre-trained language models with human expert annotations to overcome data scarcity challenges, achieving more accurate scoring than standard LLM evaluators.
DEFINED addresses a fundamental limitation in AI assessment: measuring creativity in complex, open-ended environments rather than simple standardized tasks. Traditional creativity evaluation relies on costly human experts, creating a bottleneck for scaling assessment across domains. The framework operationalizes debate creativity through multiple dimensions—encompassing both divergent and convergent thinking—reflecting how creativity manifests in real-world competitive contexts.
The research builds on growing recognition that LLMs excel at pattern matching but struggle with nuanced evaluation of subjective qualities. By leveraging authentic competition data from debate tournaments, the researchers created a more ecologically valid testing ground than synthetic benchmarks. The mixed-granularity training strategy and constrained data augmentation address practical constraints: limited fine-grained expert annotations and bias toward elite performers in original datasets.
For the AI development community, DEFINED demonstrates that smaller, specialized models trained on quality expert data can outperform prompt-based approaches using general-purpose LLMs. This challenges assumptions that larger models automatically provide better evaluation. The framework's success in validating performance against debate-naive participants suggests the approach captures generalizable aspects of creativity rather than gaming narrow benchmarks.
The implications extend beyond debate evaluation. As organizations increasingly rely on automated assessment for hiring, education, and content moderation, frameworks that accurately measure higher-order thinking become strategically valuable. The methodology could transfer to other domains requiring creativity assessment—innovation challenges, creative writing, scientific hypothesis generation. This work signals growing maturity in building AI systems that understand nuanced human capabilities rather than just processing text.
- →DEFINED uses hierarchical eight-dimensional metrics and constrained data augmentation to assess creativity in debate scenarios with limited expert data.
- →The framework outperforms prompt-based LLM evaluators by training smaller specialized models on quality expert annotations.
- →The approach validates ecological validity through empirical studies with non-expert participants, moving beyond synthetic benchmarks.
- →Mixed-granularity training enables robust learning from fine-grained supervision while addressing elite bias in original datasets.
- →The methodology could transfer to other domains requiring nuanced assessment of creative and complex thinking abilities.