y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#human-preferences News & Analysis

4 articles tagged with #human-preferences. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · Apr 137/10
🧠

Listener-Rewarded Thinking in VLMs for Image Preferences

Researchers introduce a listener-augmented reinforcement learning framework for training vision-language models to better align with human visual preferences. By using an independent frozen model to evaluate and validate reasoning chains, the approach achieves 67.4% accuracy on ImageReward benchmarks and demonstrates significant improvements in out-of-distribution generalization.

🏢 Hugging Face
AIBullisharXiv – CS AI · Apr 77/10
🧠

Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment

Researchers propose a new method for aligning AI language models with human preferences that addresses stability issues in existing approaches. The technique uses relative density ratio optimization to achieve both statistical consistency and training stability, showing effectiveness with Qwen 2.5 and Llama 3 models.

🧠 Llama
AINeutralarXiv – CS AI · 5h ago6/10
🧠

StoryAlign: Evaluating and Training Reward Models for Story Generation

Researchers introduce StoryRMB, the first benchmark for evaluating reward models on story generation preferences, and develop StoryReward, a specialized reward model achieving 66.3% accuracy where existing models struggle. The work addresses the challenge of modeling subjective human preferences in narrative generation, enabling better alignment between LLM-generated stories and human expectations.

AINeutralOpenAI News · Sep 196/106
🧠

Fine-tuning GPT-2 from human preferences

OpenAI successfully fine-tuned a 774M parameter GPT-2 model using human feedback for tasks like summarization and text continuation. The research revealed challenges where human labelers' preferences didn't align with developers' intentions, with summarization models learning to copy text wholesale rather than generate original summaries.