y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-steering News & Analysis

9 articles tagged with #model-steering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles
AIBullisharXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Distributed Interpretability and Control for Large Language Models

Researchers have developed a scalable system for interpreting and controlling large language models distributed across multiple GPUs, achieving up to 7x memory reduction and 41x throughput improvements. The method enables real-time behavioral steering of frontier LLMs like LLaMA and Qwen without fine-tuning, with results released as open-source tooling.

AIBullisharXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations

Researchers introduce a geometric framework for understanding LLM hallucinations, showing they arise from basin structures in latent space that vary by task complexity. The study demonstrates that factual tasks have clearer separation while summarization tasks show unstable, overlapping patterns, and proposes geometry-aware steering to reduce hallucinations without retraining.

AINeutralarXiv โ€“ CS AI ยท Mar 277/10
๐Ÿง 

Sparse Visual Thought Circuits in Vision-Language Models

Research reveals that sparse autoencoder (SAE) features in vision-language models often fail to compose modularly for reasoning tasks. The study finds that combining task-selective feature sets frequently causes output drift and accuracy degradation, challenging assumptions used in AI model steering methods.

AINeutralarXiv โ€“ CS AI ยท Mar 277/10
๐Ÿง 

Closing the Confidence-Faithfulness Gap in Large Language Models

Researchers have identified a fundamental issue in large language models where verbalized confidence scores don't align with actual accuracy due to orthogonal encoding of these signals. They discovered a 'Reasoning Contamination Effect' where simultaneous reasoning disrupts confidence calibration, and developed a two-stage adaptive steering pipeline to improve alignment.

AIBullisharXiv โ€“ CS AI ยท Mar 37/102
๐Ÿง 

Sparse Shift Autoencoders for Identifying Concepts from Large Language Model Activations

Researchers introduce Sparse Shift Autoencoders (SSAEs), a new method for improving large language model interpretability by learning sparse representations of differences between embeddings rather than the embeddings themselves. This approach addresses the identifiability problem in current sparse autoencoder techniques, potentially enabling more precise control over specific AI behaviors without unintended side effects.

AIBullisharXiv โ€“ CS AI ยท Apr 66/10
๐Ÿง 

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

Researchers developed a method to identify valence-arousal subspaces in large language models, enabling controlled emotional steering of AI outputs. The technique demonstrates cross-architecture effectiveness on multiple models and reveals that emotional control can bidirectionally influence AI behaviors like refusal and sycophancy.

๐Ÿง  Llama
AINeutralarXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

Researchers introduce Gradient Atoms, an unsupervised method that decomposes AI model training gradients to discover interpretable behaviors without requiring predefined queries. The technique can identify model behaviors like refusal patterns and arithmetic capabilities, while also serving as effective steering vectors to control model outputs.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

Researchers have developed EasySteer, a unified framework for controlling large language model behavior at inference time that achieves 10.8-22.3x speedup over existing frameworks. The system offers modular architecture with pre-computed steering vectors for eight application domains and transforms steering from a research technique into production-ready capability.