y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#human-evaluation News & Analysis

3 articles tagged with #human-evaluation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AINeutralarXiv – CS AI · 6d ago6/10
🧠

InFerActive: Interactive Tree-Based Exploration of LLM Sampling for Safety Evaluation

InFerActive is an interactive system that improves how AI safety evaluators assess large language models by visualizing sampling results as navigable trees rather than static spreadsheets. The tool uses breadth-first sampling to achieve equivalent harmful-response coverage with up to 5x fewer samples, significantly improving evaluation efficiency according to controlled user studies.

AINeutralarXiv – CS AI · May 126/10
🧠

A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives

Researchers developed a reflective storytelling agent that combines large language models with knowledge graphs and argumentation theory to generate personalized narratives for older adults. Testing with 55 participants showed the system successfully identified personally relevant purposes in two-thirds of narratives, with argument-based grounding and hallucination detection significantly improving perceived consistency and clarity.