EigenBench: A Comparative Behavioral Measure of Value Alignment
Researchers have developed EigenBench, a new black-box method for measuring how well AI language models align with human values. The system uses an ensemble of models to judge each other's outputs against a given constitution, producing alignment scores that closely match human evaluator judgments.