AIBearisharXiv – CS AI · 3h ago7/10
🧠
Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation
A research position paper argues the AI/ML community should abandon the "positive backdoor" terminology and instead rigorously evaluate trigger-activated hidden behaviors as "Secret Alignment." Researchers found that existing implementations show significant brittleness in security properties, particularly in confidentiality, integrity, and availability—revealing that protective claims lack standardized evaluation frameworks.