AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose privacy-preserving group emotion recognition (GER) systems using multimodal audio-video analysis instead of individual biometric data. Two novel architectures—a cross-attention fusion model and a Variational Encoder Multi-Decoder framework—demonstrate that competitive emotion inference is achievable at the collective level without monitoring individual faces, voices, or gazes.
AINeutralarXiv – CS AI · Jun 56/10
🧠Researchers introduce ProSarc, an audio-only machine learning framework that detects sarcasm by analyzing temporal mismatches between local prosodic patterns and overall emotional tone. The model achieves strong performance on multiple datasets (F1=75.3 on MUStARD++) and demonstrates cross-lingual generalization, advancing computational understanding of spoken sarcasm detection.
AINeutralarXiv – CS AI · Jun 55/10
🧠Researchers propose an emotion-aware text-to-image pipeline that uses large language models and fine-tuned Stable Diffusion to generate children's drawing-style images from Korean diary entries. The system combines sentiment recognition via Qwen3-8B with LoRA-fine-tuned image generation, addressing T2I models' inability to capture emotional context effectively.
🧠 Stable Diffusion
AINeutralarXiv – CS AI · Jun 55/10
🧠Researchers propose EEGDancer, a machine learning framework that combines vector-quantized representation learning, masked temporal modeling, and reinforcement learning to predict continuous emotional states from EEG brain signals. The approach outperforms existing methods on standard emotion prediction datasets by modeling long-range temporal dependencies rather than treating emotion prediction as frame-by-frame regression.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers demonstrate that Large Language Models and human brain activity share a common valence (emotional) axis, with LLMs trained on emotion-evocative sentences producing representations that align with EEG patterns across 123 subjects. However, directly supervising neural networks to match this axis paradoxically degrades performance, leading to a discovery called the 'saturation regularity' that suggests optimal brain decoding requires ensemble methods leveraging residual diversity rather than additional constraint-based training.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers introduce UF-AMA, a unified framework for cross-domain emotion recognition using multimodal physiological signals like EEG and eye-tracking data. The model employs adaptive alignment mechanisms and multi-level domain adaptation to achieve state-of-the-art performance in cross-subject and cross-session emotion recognition tasks.
AINeutralarXiv – CS AI · Jun 25/10
🧠A new study demonstrates that upper-face affective cues significantly enhance audiovisual speech recognition systems when audio quality degrades, particularly in noisy environments. Rather than encoding linguistic content directly, emotional facial expressions improve model calibration and robustness, suggesting that human communication relies on socially expressive signals beyond traditional mouth-region visual cues.
AIBullisharXiv – CS AI · Jun 26/10
🧠Researchers propose Morlet Spectral Transformer (MST), a novel neural network architecture for detecting emotions from EEG brain signals across different subjects. The method outperforms larger pretrained models by using specialized wavelet-based signal processing and frequency-specific spatial analysis, demonstrating that intelligent representation design can replace computationally expensive pretraining approaches.
AIBullisharXiv – CS AI · May 296/10
🧠Researchers introduce E3AD, an emotion-aware vision-language-action model that enhances autonomous driving systems by interpreting passenger emotional states alongside driving commands. The framework combines semantic understanding with emotion detection (Valence-Arousal-Dominance model) and dual-pathway spatial reasoning to improve both trajectory planning and human-vehicle comfort alignment.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers introduce Variance-Regularised Pruning (VR), a neural network pruning technique that reduces model size while maintaining robust performance across diverse users. The method balances computational efficiency with cross-participant stability in affective computing systems, achieving 80% sparsity without sacrificing reliability on the AGAIN emotion recognition dataset.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers introduce SMILE-Next, a comprehensive dataset and specialized large language model framework for understanding laughter in real-world contexts. The work combines laughter detection, classification, and reasoning tasks with novel training techniques including laughter-specific self-instruction and a mixture-of-experts architecture to improve multimodal language model performance on this underexplored domain.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduced HumanVBench, a comprehensive benchmark for evaluating how well multimodal AI models understand human-centric video content across 16 tasks including emotion recognition and speech-visual alignment. The study evaluated 30 leading MLLMs and found significant performance gaps, even among top proprietary models, while introducing automated synthesis pipelines to enable scalable benchmark creation with minimal human effort.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce A-MBER, a benchmark dataset designed to evaluate AI assistants' ability to recognize emotions based on long-term interaction history rather than immediate context. The benchmark tests whether models can retrieve relevant past interactions, infer current emotional states, and provide grounded explanations—revealing that memory's value lies in selective, context-aware interpretation rather than simple historical volume.
AINeutralarXiv – CS AI · Mar 124/10
🧠Researchers propose AMB-DSGDN, a new AI system for multimodal emotion recognition that uses adaptive modality balancing and differential graph attention mechanisms. The system addresses limitations in existing approaches by filtering noise and preventing dominant modalities from overwhelming the fusion process in text, speech, and visual data.
AINeutralarXiv – CS AI · Mar 114/10
🧠Researchers introduce VoxEmo, a comprehensive benchmark for evaluating Speech Large Language Models on emotion recognition tasks across 35 emotion corpora and 15 languages. The benchmark addresses evaluation challenges in open text generation and introduces novel protocols that better align with human subjective emotion perception.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers have created a new multi-task Chinese dialogue dataset that enables prediction of user satisfaction, emotion recognition, and emotional state transitions across multiple conversation turns. The dataset addresses limitations in existing Chinese resources and aims to improve understanding of how user emotions evolve during interactions to better predict satisfaction.