#encoder-decoder News & Analysis

14 articles tagged with #encoder-decoder. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

14 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

End-to-End Context Compression at Scale

Researchers introduce Latent Context Language Models (LCLMs), a new encoder-decoder compression approach that addresses memory bottlenecks in long-context language model inference. By compressing KV caches at ratios of 1:4 to 1:16 while maintaining model quality, LCLMs enable faster processing of extended contexts and support adaptive expansion for long-horizon agent applications.

AINeutralGoogle DeepMind Blog · Oct 257/106

🧠

T5Gemma: A new collection of encoder-decoder Gemma models

Google introduces T5Gemma, a new collection of encoder-decoder large language models (LLMs) based on the Gemma architecture. This represents an expansion of Google's Gemma model family to include encoder-decoder capabilities alongside the existing decoder-only models.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Streaming T5-based Text-to-Speech Synthesis with Limited Lookahead

Researchers introduce S5-TTS, a streaming variant of T5-based text-to-speech that generates speech word-by-word with minimal latency by processing limited lookahead context. The system uses novel masking mechanisms and distillation techniques to maintain speech quality and speaker similarity while enabling real-time conversational AI applications.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Cross-Attention is Half Explanation in Speech-to-Text Models

Researchers find that cross-attention mechanisms in speech-to-text models only explain about 50% of how the decoder attends to input, contradicting widespread assumptions that attention scores reliably indicate which parts of the audio are most relevant. The study across multiple model scales shows attention provides an incomplete view of the factors driving predictions.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction

Researchers propose an auxiliary reconstruction module to improve encoder representations in neural algorithmic reasoning systems. By forcing encoders to reconstruct input states and capture feature dependencies, the method enhances the performance of existing neural architectures on algorithmic reasoning benchmarks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Introduction to Graph Neural Networks for Machine Learning Engineers

A comprehensive survey introduces graph neural networks (GNNs) through an encoder-decoder framework, demonstrating their effectiveness across various graph analytics tasks. The paper emphasizes critical challenges like oversmoothing and oversquashing in GNN training, providing experimental insights on how network performance scales with training data and graph complexity.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Block-Based Double Decoders

Researchers propose block-based double decoders, a transformer architecture that combines the training efficiency of decoder-only models with the inference speed advantages of encoder-decoder models. The innovation uses doubly-causal block-based attention masks to enable full loss supervision and static sequence packing, achieving 2/3 reduction in KV-cache memory and per-token compute at inference time.

AINeutralarXiv – CS AI · May 276/10

🧠

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

Researchers propose DRLHQ, a deep reinforcement learning approach with heterogeneous query attention mechanisms to solve capacitated location-routing problems (CLRPs) and their open variants. This marks the first end-to-end learning framework for CLRPs, demonstrating superior performance over traditional and DRL-based baselines on benchmark datasets.

AINeutralarXiv – CS AI · May 126/10

🧠

Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver

Researchers propose Constraint-Aware Residual Modulation (CARM), a neural module that improves how AI solvers handle complex vehicle routing problems by maintaining global observation during constraint-aware decision-making. The advancement demonstrates significant performance improvements across multiple routing problem variants and scaling capabilities.

AINeutralarXiv – CS AI · May 116/10

🧠

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

Researchers present a novel logical framework for understanding encoder-decoder transformers using temporal logic extended with counting and past modalities. The work provides theoretical foundations for how these architectures process information across attention mechanisms, with implications for LLM interpretability and design.

AINeutralarXiv – CS AI · May 16/10

🧠

Why Self-Supervised Encoders Want to Be Normal

Researchers develop a theoretical framework connecting Information Bottleneck principles to encoder-decoder learning through rate-distortion analysis, showing optimal representations form soft clusters on probability manifolds. The work introduces Sketched Isotropic Gaussian Regularization (SIGReg) as a principled regularizer for self-supervised, semi-supervised, and supervised learning without requiring variational bounds.

AIBullisharXiv – CS AI · Mar 36/108

🧠

Mamba-CAD: State Space Model For 3D Computer-Aided Design Generative Modeling

Researchers introduce Mamba-CAD, a state space model using Mamba architecture for generating complex 3D CAD models from parametric sequences. The model addresses limitations in handling longer, fine-grained industrial CAD sequences through an encoder-decoder framework paired with GANs, trained on a new dataset of 77,078 CAD models.

AINeutralHugging Face Blog · Nov 91/107

🧠

Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

The article title suggests content about leveraging pre-trained language model checkpoints for encoder-decoder models, but no article body was provided for analysis.

AINeutralHugging Face Blog · Oct 101/106

🧠

Transformer-based Encoder-Decoder Models

The article title references Transformer-based Encoder-Decoder Models, a fundamental AI architecture used in natural language processing and machine learning. However, no article body content was provided to analyze specific details, applications, or implications.