#steering News & Analysis

6 articles tagged with #steering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · May 117/10

🧠

Tool Calling is Linearly Readable and Steerable in Language Models

Researchers discovered that language models encode tool-selection decisions in interpretable linear patterns within their internal activations, enabling both prediction of errors before execution and steering of tool choices at 77-100% accuracy. This finding has implications for making AI agents more reliable and controllable, particularly in high-stakes scenarios where wrong tool selection causes irreversible failures.

🧠 Llama

AINeutralarXiv – CS AI · Mar 46/102

🧠

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Researchers introduce SteerEval, a new benchmark for evaluating how controllable Large Language Models are across language features, sentiment, and personality domains. The study reveals that current steering methods often fail at finer-grained control levels, highlighting significant risks when deploying LLMs in socially sensitive applications.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering

Researchers introduce a Riemannian-manifold framework for steering language models that eliminates the need for labeled data or predefined topologies. The method approximates output-space geometry using a learned encoder trained on concept tokens, enabling more natural intervention trajectories across diverse tasks without per-prompt labeling.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Researchers introduce CTRL-STEER, a closed-loop control framework that enables Vision-Language-Action models to dynamically adjust steering interventions at test time based on real-time feedback rather than using fixed coefficients. The method uses adaptive control signals to regulate internal model directions, demonstrating improved task success and stability on robotic control benchmarks without modifying the base model.

AINeutralarXiv – CS AI · Jun 26/10

🧠

On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering

Researchers identify Marginal Path Collapse, a failure mode in diffusion model steering where intermediate densities become non-normalizable despite valid endpoints. They propose Adaptive Path Correction with Exponents (ACE), a framework using time-varying exponents to stabilize compositional sampling in drug design and image generation tasks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Researchers present a unified framework for understanding how different methods control large language models—including fine-tuning, LoRA, and activation interventions—revealing a fundamental trade-off between steering strength and output quality. The analysis explains this through an activation manifold perspective and introduces SPLIT, a new steering method that improves control while better preserving model coherence.