The Transformer Family Version 2.0
This article presents an updated and expanded version of a comprehensive guide to Transformer architecture improvements, building upon a 2020 post. The new version is twice the length and includes recent developments in Transformer models, providing detailed technical notations and covering both encoder-decoder and simplified architectures like BERT and GPT.