AIBullisharXiv – CS AI · 3h ago6/10
🧠
Entropy-aware Masking for Masked Language Modeling
Researchers propose entropy-aware masking for masked language modeling, which selectively masks tokens based on prediction uncertainty rather than random selection. The approach achieves 5% improvement in GLUE scores and performs best when combined with knowledge distillation, offering a more efficient pretraining strategy for encoder-based language models.