#human-values News & Analysis

3 articles tagged with #human-values. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Mar 37/103

🧠

EigenBench: A Comparative Behavioral Measure of Value Alignment

Researchers have developed EigenBench, a new black-box method for measuring how well AI language models align with human values. The system uses an ensemble of models to judge each other's outputs against a given constitution, producing alignment scores that closely match human evaluator judgments.

AINeutralarXiv – CS AI · 15h ago6/10

🧠

PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization

Researchers introduce PICACO, a novel in-context alignment method that optimizes meta-instructions to help large language models better understand and balance multiple, often conflicting human values without fine-tuning. The approach uses total correlation optimization to improve alignment across up to 8 distinct values while reducing noise, addressing a key limitation where LLMs struggle to reconcile competing preferences in single prompts.

AINeutralOpenAI News · Aug 276/108

🧠

Collective alignment: public input on our Model Spec

OpenAI conducted a survey of over 1,000 people globally to gather public input on AI behavior standards and compared these responses to their Model Spec guidelines. The initiative represents OpenAI's effort toward collective alignment, aiming to incorporate diverse human values and perspectives into AI system defaults.