9 articles tagged with #model-steering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Apr 107/10
๐ง Researchers have developed a scalable system for interpreting and controlling large language models distributed across multiple GPUs, achieving up to 7x memory reduction and 41x throughput improvements. The method enables real-time behavioral steering of frontier LLMs like LLaMA and Qwen without fine-tuning, with results released as open-source tooling.
AIBullisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers introduce a geometric framework for understanding LLM hallucinations, showing they arise from basin structures in latent space that vary by task complexity. The study demonstrates that factual tasks have clearer separation while summarization tasks show unstable, overlapping patterns, and proposes geometry-aware steering to reduce hallucinations without retraining.
AINeutralarXiv โ CS AI ยท Mar 277/10
๐ง Research reveals that sparse autoencoder (SAE) features in vision-language models often fail to compose modularly for reasoning tasks. The study finds that combining task-selective feature sets frequently causes output drift and accuracy degradation, challenging assumptions used in AI model steering methods.
AINeutralarXiv โ CS AI ยท Mar 277/10
๐ง Researchers have identified a fundamental issue in large language models where verbalized confidence scores don't align with actual accuracy due to orthogonal encoding of these signals. They discovered a 'Reasoning Contamination Effect' where simultaneous reasoning disrupts confidence calibration, and developed a two-stage adaptive steering pipeline to improve alignment.
AIBullisharXiv โ CS AI ยท Mar 37/102
๐ง Researchers introduce Sparse Shift Autoencoders (SSAEs), a new method for improving large language model interpretability by learning sparse representations of differences between embeddings rather than the embeddings themselves. This approach addresses the identifiability problem in current sparse autoencoder techniques, potentially enabling more precise control over specific AI behaviors without unintended side effects.
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers developed a method to identify valence-arousal subspaces in large language models, enabling controlled emotional steering of AI outputs. The technique demonstrates cross-architecture effectiveness on multiple models and reveals that emotional control can bidirectionally influence AI behaviors like refusal and sycophancy.
๐ง Llama
AINeutralarXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce Gradient Atoms, an unsupervised method that decomposes AI model training gradients to discover interpretable behaviors without requiring predefined queries. The technique can identify model behaviors like refusal patterns and arithmetic capabilities, while also serving as effective steering vectors to control model outputs.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers developed training-free model steering techniques to improve reasoning in large audio-language models (LALMs) through chain-of-thought prompting. The approach achieved up to 4.4% accuracy gains and demonstrated cross-modal transfer where text-derived steering vectors can effectively guide speech-based reasoning.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers have developed EasySteer, a unified framework for controlling large language model behavior at inference time that achieves 10.8-22.3x speedup over existing frameworks. The system offers modular architecture with pre-computed steering vectors for eight application domains and transforms steering from a research technique into production-ready capability.