y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#preference-learning News & Analysis

17 articles tagged with #preference-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles
AIBullisharXiv – CS AI Β· Mar 117/10
🧠

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

Researchers introduce ACTIVEULTRAFEEDBACK, an active learning pipeline that reduces the cost of training Large Language Models by using uncertainty estimates to identify the most informative responses for annotation. The system achieves comparable performance using only one-sixth of the annotated data compared to static baselines, potentially making LLM training more accessible for low-resource domains.

🏒 Hugging Face
AINeutralarXiv – CS AI Β· Mar 97/10
🧠

Aligning Compound AI Systems via System-level DPO

Researchers introduce SysDPO, a framework that extends Direct Preference Optimization to align compound AI systems comprising multiple interacting components like LLMs, foundation models, and external tools. The approach addresses challenges in optimizing complex AI systems by modeling them as Directed Acyclic Graphs and enabling system-level alignment through two variants: SysDPO-Direct and SysDPO-Sampling.

AIBullisharXiv – CS AI Β· Mar 47/103
🧠

Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals

Researchers introduce Density-Guided Response Optimization (DGRO), a new AI alignment method that learns community preferences from implicit acceptance signals rather than explicit feedback. The technique uses geometric patterns in how communities naturally engage with content to train language models without requiring costly annotation or preference labeling.

AIBullisharXiv – CS AI Β· Mar 47/103
🧠

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Researchers introduce Skywork-Reward-V2, a suite of AI reward models trained on SynPref-40M, a massive 40-million preference pair dataset created through human-AI collaboration. The models achieve state-of-the-art performance across seven major benchmarks by combining human annotation quality with AI scalability for better preference learning.

AIBullishOpenAI News Β· Jun 137/107
🧠

Learning from human preferences

OpenAI and DeepMind have collaborated to develop an algorithm that can learn human preferences by comparing two proposed behaviors, eliminating the need for humans to manually write goal functions. This approach aims to reduce dangerous AI behavior that can result from oversimplified or incorrect goal specifications.

AINeutralarXiv – CS AI Β· 4d ago6/10
🧠

Principles Do Not Apply Themselves: A Hermeneutic Perspective on AI Alignment

A new arXiv paper argues that AI alignment cannot rely solely on stated principles because their real-world application requires contextual judgment and interpretation. The research shows that a significant portion of preference-labeling data involves principle conflicts or indifference, meaning principles alone cannot determine decisionsβ€”and these interpretive choices often emerge only during model deployment rather than in training data.

AINeutralarXiv – CS AI Β· 4d ago6/10
🧠

Relational Preference Encoding in Looped Transformer Internal States

Researchers demonstrate that looped transformers like Ouro-2.6B encode human preferences relationally rather than independently, with pairwise evaluators achieving 95.2% accuracy compared to 21.75% for independent classification. The study reveals that preference encoding is fundamentally relational, functioning as an internal consistency probe rather than a direct predictor of human annotations.

🏒 Anthropic
AINeutralarXiv – CS AI Β· Apr 106/10
🧠

Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach

Researchers propose using Inductive Learning of Answer Set Programs (ILASP) to create interpretable approximations of neural networks trained on preference learning tasks. The approach combines dimensionality reduction through Principal Component Analysis with logic-based explanations, addressing the challenge of explaining black-box AI models while maintaining computational efficiency.

AINeutralarXiv – CS AI Β· Apr 66/10
🧠

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Research from arXiv shows that Active Preference Learning (APL) provides minimal improvements over random sampling in training modern LLMs through Direct Preference Optimization. The study found that random sampling performs nearly as well as sophisticated active selection methods while being computationally cheaper and avoiding capability degradation.

AIBullisharXiv – CS AI Β· Mar 266/10
🧠

Safe Reinforcement Learning with Preference-based Constraint Inference

Researchers propose Preference-based Constrained Reinforcement Learning (PbCRL), a new approach for safe AI decision-making that learns safety constraints from human preferences rather than requiring extensive expert demonstrations. The method addresses limitations in existing Bradley-Terry models by introducing a dead zone mechanism and Signal-to-Noise Ratio loss to better capture asymmetric safety costs and improve constraint alignment.

AIBullisharXiv – CS AI Β· Mar 66/10
🧠

What Is Missing: Interpretable Ratings for Large Language Model Outputs

Researchers introduce the What Is Missing (WIM) rating system for Large Language Models that uses natural-language feedback instead of numerical ratings to improve preference learning. WIM computes ratings by analyzing cosine similarity between model outputs and judge feedback embeddings, producing more interpretable and effective training signals with fewer ties than traditional rating methods.

AINeutralarXiv – CS AI Β· Mar 36/103
🧠

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

Researchers have developed a new preference learning framework that addresses bias in AI alignment by ensuring policies reflect true population distributions rather than just majority opinions. The approach uses social choice theory principles and has been validated on both recommendation tasks and large language model alignment.

AINeutralarXiv – CS AI Β· Mar 35/103
🧠

AWARE-US: Preference-Aware Infeasibility Resolution in Tool-Calling Agents

Researchers developed AWARE-US, a system to improve AI agents' ability to handle failed database queries by intelligently relaxing the least important user constraints rather than simply returning 'no results'. The system uses three LLM-based methods to infer constraint importance from dialogue, achieving up to 56% accuracy in correct constraint relaxation.

AINeutralarXiv – CS AI Β· Feb 275/107
🧠

Same Words, Different Judgments: Modality Effects on Preference Alignment

Researchers conducted a cross-modal study comparing human preference annotations between text and audio formats for AI alignment. The study found that while audio preferences are as reliable as text, different modalities lead to different judgment patterns, with synthetic ratings showing promise as replacements for human annotations.

$NEAR
AINeutralarXiv – CS AI Β· Mar 274/10
🧠

Gaze patterns predict preference and confidence in pairwise AI image evaluation

Researchers used eye-tracking to analyze how humans make preference judgments when evaluating AI-generated images, finding that gaze patterns can predict both user choices and confidence levels. The study revealed that participants' eyes shift toward chosen images about one second before making decisions, and gaze features achieved 68% accuracy in predicting binary choices.

AIBullisharXiv – CS AI Β· Mar 115/10
🧠

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

Researchers developed CMA-ES-IG, a new algorithm that helps robots learn user preferences more effectively by incorporating user experience considerations. The algorithm suggests perceptually distinct and informative robot behaviors for users to rank, showing improved scalability, computational efficiency, and user satisfaction compared to existing methods.