y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#vision-transformers News & Analysis

13 articles tagged with #vision-transformers. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

13 articles
AIBullisharXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

Zero-Shot Quantization via Weight-Space Arithmetic

Researchers have developed a zero-shot quantization method that transfers robustness between AI models through weight-space arithmetic, improving post-training quantization performance by up to 60% without requiring additional training. This breakthrough enables low-cost deployment of extremely low-bit models by extracting 'quantization vectors' from donor models to patch receiver models.

AIBullisharXiv โ€“ CS AI ยท Mar 46/103
๐Ÿง 

SiNGER: A Clearer Voice Distills Vision Transformers Further

Researchers introduce SiNGER, a new knowledge distillation framework for Vision Transformers that suppresses harmful high-norm artifacts while preserving informative signals. The technique uses nullspace-guided perturbation and LoRA-based adapters to achieve state-of-the-art performance in downstream tasks.

AIBullisharXiv โ€“ CS AI ยท Feb 277/106
๐Ÿง 

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Researchers developed ViT-Linearizer, a distillation framework that transfers Vision Transformer knowledge into linear-time models, addressing quadratic complexity issues for high-resolution inputs. The method achieves 84.3% ImageNet accuracy while providing significant speedups, bridging the gap between efficient RNN-based architectures and transformer performance.

AIBullisharXiv โ€“ CS AI ยท Apr 76/10
๐Ÿง 

Automated Attention Pattern Discovery at Scale in Large Language Models

Researchers developed AP-MAE, a vision transformer model that analyzes attention patterns in large language models at scale to improve interpretability. The system can predict code generation accuracy with 55-70% precision and enable targeted interventions that increase model accuracy by 13.6%.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

AdapterTune introduces a new method for efficiently fine-tuning Vision Transformers by using zero-initialized low-rank adapters that start at the pretrained function to prevent optimization instability. The technique achieves +14.9 point accuracy improvement over head-only transfer while using only 0.92% of parameters needed for full fine-tuning.

AIBearisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

On the Adversarial Transferability of Generalized "Skip Connections"

Researchers discovered that skip connections in deep neural networks make adversarial attacks more transferable across different AI models. They developed the Skip Gradient Method (SGM) which exploits this vulnerability in ResNets, Vision Transformers, and even Large Language Models to create more effective adversarial examples.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

Researchers introduce DART, a new framework for early-exit deep neural networks that achieves up to 3.3x speedup and 5.1x lower energy consumption while maintaining accuracy. The system uses input difficulty estimation and adaptive thresholds to optimize AI inference for resource-constrained edge devices.

AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Researchers propose Contract And Conquer (CAC), a new method for provably generating adversarial examples against black-box neural networks using knowledge distillation and search space contraction. The approach provides theoretical guarantees for finding adversarial examples within a fixed number of iterations and outperforms existing methods on ImageNet datasets including vision transformers.

AIBullisharXiv โ€“ CS AI ยท Mar 36/106
๐Ÿง 

What Helps -- and What Hurts: Bidirectional Explanations for Vision Transformers

Researchers propose BiCAM, a new method for interpreting Vision Transformer (ViT) decisions that captures both positive and negative contributions to predictions. The approach improves explanation quality and enables adversarial example detection across multiple ViT variants without requiring model retraining.

AIBullisharXiv โ€“ CS AI ยท Mar 175/10
๐Ÿง 

Human-like Object Grouping in Self-supervised Vision Transformers

Researchers developed a behavioral benchmark showing that self-supervised vision transformers, particularly those trained with DINO objectives, align closely with human object perception and segmentation behavior. The study found that models with stronger object-centric representations better predict human visual judgments, with Gram matrix structure playing a key role in perceptual alignment.

AINeutralarXiv โ€“ CS AI ยท Feb 274/107
๐Ÿง 

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

Researchers developed a semi-supervised machine learning pipeline using vision transformers and k-Nearest Neighbor classifiers to automatically detect poor-quality exposures in astronomical imaging surveys. The method was successfully applied to the DECam Legacy Survey, identifying 780 problematic exposures that were verified through visual inspection.

AINeutralarXiv โ€“ CS AI ยท Mar 24/107
๐Ÿง 

Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

Researchers analyzed DINOv2 vision transformer using Sparse Autoencoders to understand how it processes visual information, discovering that the model uses specialized concept dictionaries for different tasks like classification and segmentation. They propose the Minkowski Representation Hypothesis as a new framework for understanding how vision transformers combine conceptual archetypes to form representations.

AINeutralHugging Face Blog ยท Aug 181/107
๐Ÿง 

Deep Dive: Vision Transformers On Hugging Face Optimum Graphcore

The article appears to be about Vision Transformers implementation on Hugging Face's Optimum Graphcore platform, but the article body is empty or not provided. Without content to analyze, no specific technical details or implications can be determined.