y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#value-alignment News & Analysis

6 articles tagged with #value-alignment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AIBullisharXiv โ€“ CS AI ยท Mar 66/10
๐Ÿง 

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

Researchers propose VISA (Value Injection via Shielded Adaptation), a new framework for aligning Large Language Models with human values while avoiding the 'alignment tax' that causes knowledge drift and hallucinations. The system uses a closed-loop architecture with value detection, translation, and rewriting components, demonstrating superior performance over standard fine-tuning methods and GPT-4o in maintaining factual consistency.

๐Ÿง  GPT-4
AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

EigenBench: A Comparative Behavioral Measure of Value Alignment

Researchers have developed EigenBench, a new black-box method for measuring how well AI language models align with human values. The system uses an ensemble of models to judge each other's outputs against a given constitution, producing alignment scores that closely match human evaluator judgments.

AINeutralarXiv โ€“ CS AI ยท Apr 106/10
๐Ÿง 

Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook

Researchers introduce DOVE, a distributional evaluation framework that measures how well large language models align with cultural values through open-ended text generation rather than multiple-choice tests. The framework uses rate-distortion optimization to create a value codebook and unbalanced optimal transport to assess alignment, demonstrating 31.56% correlation with downstream tasks across 12 LLMs while requiring only 500 samples per culture.

AINeutralarXiv โ€“ CS AI ยท Mar 37/1010
๐Ÿง 

Contesting Artificial Moral Agents

A research paper proposes a 5E framework (ethical, epistemological, explainable, empirical, evaluative) for contesting Artificial Moral Agents (AMAs) - AI systems with inherent moral reasoning capabilities. The framework includes spheres of ethical influence at individual, local, societal, and global levels, along with a timeline for developers to anticipate or self-contest their AMA technologies.

AINeutralarXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Cognitive models can reveal interpretable value trade-offs in language models

Researchers developed a framework using cognitive models from psychology to analyze value trade-offs in language models, revealing how AI systems balance competing priorities like politeness and directness. The study shows LLMs' behavioral profiles shift predictably when prompted to prioritize certain goals and are influenced by reasoning budgets and training dynamics.