y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#human-alignment News & Analysis

5 articles tagged with #human-alignment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBearisharXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

A new study challenges the validity of using LLM judges as proxies for human evaluation of AI-generated disinformation, finding that eight frontier LLM judges systematically diverge from human reader responses in their scoring, ranking, and reliance on textual signals. The research demonstrates that while LLMs agree strongly with each other, this internal coherence masks fundamental misalignment with actual human perception, raising critical questions about the reliability of automated content moderation at scale.

AIBullisharXiv โ€“ CS AI ยท Feb 277/106
๐Ÿง 

Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

Researchers introduce Dual-Iterative Preference Optimization (Dual-IPO), a new method that iteratively improves both reward models and video generation models to create higher-quality AI-generated videos better aligned with human preferences. The approach enables smaller 2B parameter models to outperform larger 5B models without requiring manual preference annotations.

AINeutralarXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

MERIT Feedback Elicits Better Bargaining in LLM Negotiators

Researchers introduce AgoraBench, a new framework for improving Large Language Models' bargaining and negotiation capabilities through utility-based feedback mechanisms. The study reveals that current LLMs struggle with strategic depth in negotiations and proposes human-aligned metrics and training methods to enhance their performance.

AINeutralarXiv โ€“ CS AI ยท Mar 36/109
๐Ÿง 

Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

Researchers propose a tensor factorization method that combines cheap automated evaluation data with limited human labels to enable fine-grained evaluation of AI generative models. The approach addresses the data bottleneck in model evaluation by using autorater scores to pretrain representations that are then aligned to human preferences with minimal calibration data.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

Researchers developed EditReward, a human-aligned reward model for instruction-guided image editing trained on over 200K preference pairs. The model demonstrates superior performance on established benchmarks and can effectively filter high-quality training data, addressing a key bottleneck in open-source image editing models.