y0news
#model-evaluation3 articles
3 articles
AINeutralarXiv โ€“ CS AI ยท 6h ago3
๐Ÿง 

Unlocking Cognitive Capabilities and Analyzing the Perception-Logic Trade-off

Researchers introduce MERaLiON2-Omni (Alpha), a 10B-parameter multilingual AI model designed for Southeast Asia that combines perception and reasoning capabilities. The study reveals an efficiency-stability paradox where reasoning enhances abstract tasks but causes instability in basic sensory processing like audio timing and visual interpretation.

AINeutralarXiv โ€“ CS AI ยท 6h ago8
๐Ÿง 

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Research reveals that reward model accuracy alone doesn't determine effectiveness in RLHF systems. The study proves that low reward variance can create flat optimization landscapes, making even perfectly accurate reward models inefficient teachers that underperform less accurate models with higher variance.

AINeutralarXiv โ€“ CS AI ยท 6h ago4
๐Ÿง 

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Researchers have developed an automated pipeline to detect hidden biases in Large Language Models that don't appear in their reasoning explanations. The system discovered previously unknown biases like Spanish fluency and writing formality across seven LLMs in hiring, loan approval, and university admission tasks.