AINeutralarXiv – CS AI · 3h ago6/10
🧠
Multi-Adapter Representation Interventions via Energy Calibration
Researchers propose MARI, a novel method for aligning large language models through adaptive representation interventions that adjust correction strength per input rather than applying uniform fixes. The approach combines multi-adapter experts with energy-based gating to maintain general model capabilities while improving alignment on safety and truthfulness benchmarks.