Towards Robust Arabic Speech Emotion Recognition with Deep Learning
Researchers propose a CNN-Transformer hybrid architecture for Arabic Speech Emotion Recognition that achieves 98.1% accuracy, outperforming CNN-LSTM and fine-tuned wav2vec 2.0 models. The study addresses the underexplored challenge of emotion detection in Arabic speech by combining convolutional feature extraction with Transformer-based context modeling, demonstrating effectiveness in low-resource, dialectally diverse settings.