AINeutralarXiv – CS AI · 6h ago6/10
🧠
Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs
Researchers introduce Latent Reward Steering (LRS), an inference-time framework that improves reasoning in large language models by optimizing sparse-autoencoder latent states through reward gradients. The method adaptively corrects fragile reasoning states without relying on predefined cognitive behaviors, demonstrating consistent performance improvements across multiple benchmarks.