AINeutralarXiv – CS AI · Apr 146/10
🧠
Principles Do Not Apply Themselves: A Hermeneutic Perspective on AI Alignment
A new arXiv paper argues that AI alignment cannot rely solely on stated principles because their real-world application requires contextual judgment and interpretation. The research shows that a significant portion of preference-labeling data involves principle conflicts or indifference, meaning principles alone cannot determine decisions—and these interpretive choices often emerge only during model deployment rather than in training data.