AIBullisharXiv – CS AI · 6h ago7/10
🧠
ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters
Researchers introduce ViTok-v2, a 5-billion-parameter Vision Transformer autoencoder that achieves native resolution support and stable scaling without adversarial losses. The breakthrough advances image tokenization for generative AI by improving reconstruction quality across multiple resolutions while maintaining generation capabilities.