AIBullisharXiv – CS AI · 7h ago6/10
🧠
EuroBERT: Scaling Multilingual Encoders for European Languages
Researchers introduce EuroBERT, a family of multilingual encoder models that apply recent advances from generative AI to improve vector representations across European and global languages. The models outperform existing alternatives on retrieval, classification, and coding tasks while supporting sequences up to 8,192 tokens, with code and checkpoints publicly released.