y0news
AnalyticsDigestsSourcesRSSAICrypto
#nlp-evaluation2 articles
2 articles
AINeutralarXiv – CS AI Β· 7h ago6/10
🧠

A-MBER: Affective Memory Benchmark for Emotion Recognition

Researchers introduce A-MBER, a benchmark dataset designed to evaluate AI assistants' ability to recognize emotions based on long-term interaction history rather than immediate context. The benchmark tests whether models can retrieve relevant past interactions, infer current emotional states, and provide grounded explanationsβ€”revealing that memory's value lies in selective, context-aware interpretation rather than simple historical volume.

AINeutralarXiv – CS AI Β· 7h ago6/10
🧠

Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction

Researchers evaluated how well large language models can perform formal grammar-based translation tasks using in-context learning, finding that LLM translation accuracy degrades significantly with grammar complexity and sentence length. The study identifies specific failure modes including vocabulary hallucination and untranslated source words, revealing fundamental limitations in LLMs' ability to apply formal grammatical rules to translation tasks.