AINeutralGoogle DeepMind Blog · Oct 257/106
🧠Google introduces T5Gemma, a new collection of encoder-decoder large language models (LLMs) based on the Gemma architecture. This represents an expansion of Google's Gemma model family to include encoder-decoder capabilities alongside the existing decoder-only models.
AINeutralarXiv – CS AI · 23h ago6/10
🧠Researchers propose block-based double decoders, a transformer architecture that combines the training efficiency of decoder-only models with the inference speed advantages of encoder-decoder models. The innovation uses doubly-causal block-based attention masks to enable full loss supervision and static sequence packing, achieving 2/3 reduction in KV-cache memory and per-token compute at inference time.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers propose DRLHQ, a deep reinforcement learning approach with heterogeneous query attention mechanisms to solve capacitated location-routing problems (CLRPs) and their open variants. This marks the first end-to-end learning framework for CLRPs, demonstrating superior performance over traditional and DRL-based baselines on benchmark datasets.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Constraint-Aware Residual Modulation (CARM), a neural module that improves how AI solvers handle complex vehicle routing problems by maintaining global observation during constraint-aware decision-making. The advancement demonstrates significant performance improvements across multiple routing problem variants and scaling capabilities.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present a novel logical framework for understanding encoder-decoder transformers using temporal logic extended with counting and past modalities. The work provides theoretical foundations for how these architectures process information across attention mechanisms, with implications for LLM interpretability and design.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers develop a theoretical framework connecting Information Bottleneck principles to encoder-decoder learning through rate-distortion analysis, showing optimal representations form soft clusters on probability manifolds. The work introduces Sketched Isotropic Gaussian Regularization (SIGReg) as a principled regularizer for self-supervised, semi-supervised, and supervised learning without requiring variational bounds.
AIBullisharXiv – CS AI · Mar 36/108
🧠Researchers introduce Mamba-CAD, a state space model using Mamba architecture for generating complex 3D CAD models from parametric sequences. The model addresses limitations in handling longer, fine-grained industrial CAD sequences through an encoder-decoder framework paired with GANs, trained on a new dataset of 77,078 CAD models.
AINeutralHugging Face Blog · Nov 91/107
🧠The article title suggests content about leveraging pre-trained language model checkpoints for encoder-decoder models, but no article body was provided for analysis.
AINeutralHugging Face Blog · Oct 101/106
🧠The article title references Transformer-based Encoder-Decoder Models, a fundamental AI architecture used in natural language processing and machine learning. However, no article body content was provided to analyze specific details, applications, or implications.