🤖AI Summary
Researchers introduce STAR, a new autoregressive pretraining method for Vision Mamba that uses separators to quadruple input sequence length while maintaining image dimensions. The STAR-B model achieved 83.5% accuracy on ImageNet-1k, demonstrating improved performance through better utilization of long-range dependencies in computer vision tasks.
Key Takeaways
- →Vision Mamba's causal mechanism makes it well-suited for autoregressive pretraining but current methods are limited to short sequences.
- →STAR introduces identical separators before each image to demarcate different images and extend sequence length by 4x.
- →The method preserves original dataset image dimensions while significantly increasing input sequence capacity.
- →STAR-B achieved 83.5% accuracy on ImageNet-1k, showing competitive performance in Vision Mamba models.
- →The approach demonstrates potential for enhancing vision model performance through improved long-range dependency modeling.
#vision-mamba#autoregressive#pretraining#computer-vision#sequence-modeling#imagenet#deep-learning#state-space-models
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles