y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#automated-assessment News & Analysis

5 articles tagged with #automated-assessment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBearisharXiv – CS AI · Apr 107/10
🧠

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

A new study challenges the validity of using LLM judges as proxies for human evaluation of AI-generated disinformation, finding that eight frontier LLM judges systematically diverge from human reader responses in their scoring, ranking, and reliance on textual signals. The research demonstrates that while LLMs agree strongly with each other, this internal coherence masks fundamental misalignment with actual human perception, raising critical questions about the reliability of automated content moderation at scale.

AIBearisharXiv – CS AI · Apr 106/10
🧠

The Impact of Steering Large Language Models with Persona Vectors in Educational Applications

Researchers studied how persona vectors—AI steering techniques that inject personality traits into large language models—affect educational applications like essay generation and automated grading. The study found that persona steering significantly degrades answer quality, with substantially larger negative impacts on open-ended humanities tasks compared to factual science questions, and reveals that AI scorers exhibit predictable bias patterns based on assigned personality traits.

AIBullisharXiv – CS AI · Mar 266/10
🧠

PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation

Researchers have developed PASTA, a scalable AI compliance evaluation framework that can assess multiple policies simultaneously using LLM-powered analysis. The system evaluates five major AI policies in under two minutes for approximately $3, with expert validation showing strong alignment with human judgment.