TeamHerald@CHIPSAL 2026: Hate Speech Detection and Sentiment Analysis of Nepali Memes using Transformer-based Architectures and Ensemble Learning
Researchers presented a study on detecting hate speech and analyzing sentiment in Nepali-language memes using transformer-based machine learning models and ensemble learning techniques. The work addresses challenges specific to Nepali text analysis, including code-mixing and limited baseline datasets, demonstrating that soft voting ensemble strategies outperform standalone models for multi-class sentiment tasks by 15.8% in Macro F1-score.
This research tackles a meaningful gap in NLP development by focusing on underrepresented languages and the specific challenge of meme analysis. Nepali, spoken by approximately 17 million people globally, lacks the robust computational resources available for major languages, making this work a valuable contribution to linguistic diversity in AI. The study's dual focus on hate speech detection and sentiment analysis reflects growing recognition that content moderation and social media analysis require culturally and linguistically tailored approaches.
The methodological contribution centers on demonstrating how ensemble learning strategies—particularly soft voting—outperform individual transformer models for multi-class classification problems. This finding has broader implications for NLP practitioners working with limited datasets or specialized domains. The inclusion of an OCR layer to extract text from memes addresses a practical challenge in internet culture analysis, where visual and textual elements intertwine.
While primarily academic in scope, this work supports the development of more inclusive AI systems capable of moderating harmful content across linguistic boundaries. As social media platforms expand globally, the demand for hate speech detection and sentiment analysis in non-English contexts grows proportionally. Such research enables better content moderation policies and helps prevent the spread of harmful speech in underserved communities.
Future work should explore whether these ensemble strategies transfer effectively to other low-resource languages and investigate the integration of visual elements beyond text extraction to capture meme-specific context.
- →Soft voting ensembles achieved 15.8% relative improvement in multi-class sentiment analysis compared to standalone transformer models.
- →Decoder-only transformer architectures performed best for binary hate speech detection tasks in Nepali text.
- →Code-mixing and limited baseline datasets present significant challenges for Nepali language NLP development.
- →Ensemble learning strategies demonstrate task-dependent effectiveness, varying between binary and multi-class classification problems.
- →OCR-based text extraction from memes enables computational analysis of internet culture in non-English languages.