AIBullisharXiv โ CS AI ยท 4d ago7/103
๐ง
EigenBench: A Comparative Behavioral Measure of Value Alignment
Researchers have developed EigenBench, a new black-box method for measuring how well AI language models align with human values. The system uses an ensemble of models to judge each other's outputs against a given constitution, producing alignment scores that closely match human evaluator judgments.