LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification
The LinguIUTics team achieved 4th place in the PsyDefDetect 2026 shared task by fine-tuning Qwen3-8B to classify psychological defense mechanisms in clinical conversational text, reaching a macro F1-score of 0.3917 and substantially improving performance on rare classes through specialized techniques including minority-class augmentation and ensemble methods.
The PsyDefDetect shared task addresses a genuine gap in clinical NLP: automated detection of psychological defense mechanisms in patient-therapist conversations, where nine-class imbalanced classification creates significant technical barriers. LinguIUTics' fourth-place finish represents meaningful progress on this difficult problem, with their 7.7 absolute point improvement over the baseline demonstrating the value of sophisticated handling of class imbalance rather than relying on standard transformer approaches.
The research reveals why generic solutions fail here. BERT-family encoders and zero-shot LLMs performed poorly on minority classes due to severe data imbalance inherent to clinical psychology datasets. The team's iterative approach—combining grouped stratified cross-validation to prevent data leakage, minority-class lexical augmentation for underrepresented categories, and logit bias tuning in post-processing—shows how domain-specific challenges require targeted solutions beyond standard fine-tuning practices. The dramatic improvement of the "Unclear" class from near-zero to 0.797 F1 illustrates the practical impact of these techniques.
This work has limited direct market implications but signals growing sophistication in applying LLMs to specialized domains where off-the-shelf solutions underperform. The methodology—particularly the minority-class augmentation strategy and ensemble blending approaches—offers reusable patterns for other imbalanced classification problems in healthcare, legal analysis, and content moderation. The reliance on QLoRA fine-tuning of open-source models like Qwen3-8B rather than proprietary APIs demonstrates the competitive advantage emerging in specialized NLP tasks for well-resourced research teams.
- →Class imbalance remains a critical bottleneck for LLM performance in real-world clinical NLP applications.
- →Qwen3-8B with QLoRA fine-tuning outperformed larger zero-shot models and BERT encoders on this task.
- →Minority-class round-robin augmentation and logit bias tuning substantially improved rare-class recall without sacrificing overall performance.
- →Grouped stratified cross-validation prevented critical validation-to-leaderboard performance gaps common in imbalanced datasets.
- →Open-source model fine-tuning proves cost-effective for specialized clinical NLP problems requiring domain-specific optimization.