AINeutralarXiv โ CS AI ยท 4h ago6/10
๐ง
Disposition Distillation at Small Scale: A Three-Arc Negative Result
Researchers attempted to train behavioral dispositions into small language models through distillation but found that initial positive results were artifacts of measurement errors. After rigorous validation, they discovered no reliable method to instill self-verification and uncertainty acknowledgment without degrading model performance or creating superficial stylistic mimicry across five different small models.