#transformer-architecture News & Analysis

68 articles tagged with #transformer-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

68 articles

AIBullisharXiv – CS AI · 13h ago7/10

🧠

Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

Researchers introduce DeLask, a novel decoding framework that reduces hallucinations in Large Language Models by dynamically skipping decoder layers prone to generating false information. The method uses gradient-based analysis to identify problematic layers and partially aggregates their hidden states, demonstrating consistent improvements across diverse LLMs without requiring model retraining.

AIBullisharXiv – CS AI · 13h ago7/10

🧠

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

Researchers propose a render-free framework for 3D-aware video diffusion models that uses compressed mesh tokens instead of 2D rendered guidance to control human motion in generated videos. By processing 3D geometric information directly alongside video tokens, the approach demonstrates improved performance on motion control tasks while reducing artifacts associated with traditional 2D guidance methods.

AIBullisharXiv – CS AI · 13h ago7/10

🧠

FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation

FastSLM introduces a Hierarchical Temporal Abstractor (HTA) that compresses long-form speech into just 1.67 tokens per second—a 97% reduction—while maintaining competitive performance on speech understanding benchmarks. This architecture solves a critical scaling bottleneck for multimodal AI models by preserving acoustic detail despite extreme compression, enabling efficient deployment of speech-capable language models.

AIBullisharXiv – CS AI · 13h ago7/10

🧠

Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry

Researchers introduce MIND (Data Manifold-aware Image diffusioN moDel), a novel diffusion-based image generation framework that combines discrete patch tokenization with continuous diffusion modeling. The approach achieves significant performance improvements, reducing FID scores to 2.06 on ImageNet-256×256 with guidance using only 130M parameters, substantially outperforming larger baseline models.

AIBullisharXiv – CS AI · 13h ago7/10

🧠

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Researchers introduce SubFit, a post-training compression method for Large Language Models that operates at the submodule level rather than full-layer granularity, achieving superior perplexity-accuracy trade-offs. The approach selects non-contiguous Attention and FeedForward submodules with individual fitted residual bypasses, delivering 84.6% downstream accuracy retention at 25% sparsity compared to 81.6% for existing methods.

🏢 Perplexity

AIBullisharXiv – CS AI · 13h ago7/10

🧠

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

LayerRoute is a lightweight adapter that enables language models to dynamically skip transformer blocks based on input type, achieving 12.91% computational efficiency gains with minimal training overhead. By combining per-layer routers with LoRA fine-tuning, the system learns to skip 15.25% of computations for tool calls while maintaining full capacity for complex reasoning tasks, demonstrating significant potential for optimizing agentic AI systems.

🏢 Perplexity

AIBullisharXiv – CS AI · 13h ago7/10

🧠

Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge

Researchers demonstrate that the 'reversal curse' — an autoregressive language model's inability to deduce inverse relationships from forward training data — can be mitigated through a simple data regularization technique called Identity Bridge. By adding self-referential training examples (e.g., 'Alice's name is Alice'), a 1B parameter model achieves 50% success on reversal tasks compared to near-zero baseline performance, suggesting LLMs can learn higher-level logical rules rather than merely memorizing facts.

AIBullisharXiv – CS AI · 13h ago7/10

🧠

Prototype Transformer: Towards Language Model Architectures Interpretable by Design

Researchers introduce Prototype Transformer (ProtoT), a new language model architecture that replaces standard self-attention with a linear-cost prototype-based module to improve interpretability. The approach enables models to automatically learn and represent named concepts, addressing long-standing concerns about opacity in large language models while maintaining competitive performance on standard benchmarks.

AIBullisharXiv – CS AI · 1d ago7/10

🧠

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

RayDer introduces a unified transformer architecture that consolidates camera estimation, scene reconstruction, and rendering into a single model for self-supervised novel view synthesis from real-world video. The system achieves clean power-law scaling with data and compute while maintaining competitive performance with supervised approaches, addressing a key scalability challenge in 3D vision.

AIBullisharXiv – CS AI · 1d ago7/10

🧠

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

Researchers propose OBCache, a novel KV cache pruning framework that optimizes memory efficiency for long-context LLM inference by measuring token importance based on actual impact to attention outputs rather than heuristic attention weights. The method, grounded in Optimal Brain Damage theory, demonstrates consistent accuracy improvements over existing eviction strategies on LLaMA and Qwen models.

AIBullisharXiv – CS AI · 4d ago7/10

🧠

Pushing the Limits of Block Rotations in Post-Training Quantization

Researchers present PeRQ, a post-training quantization method that uses permutations to optimize block rotations for neural network compression. The approach recovers up to 90% of full-vector rotation performance when quantizing large language models to INT4, significantly outperforming existing block rotation methods.

🏢 Perplexity🧠 Llama

AIBullisharXiv – CS AI · 4d ago7/10

🧠

Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy

Researchers introduce TimeRCD, a foundation model for time series anomaly detection that uses a novel Relative Context Discrepancy approach instead of traditional reconstruction methods. The model achieves superior zero-shot performance by detecting discrepancies between adjacent time windows, addressing fundamental limitations in existing anomaly detection systems that produce high false positive and negative rates.

AIBullisharXiv – CS AI · 4d ago7/10

🧠

Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts

Researchers have identified "keystone neurons" in large language models—a tiny subset of neurons that remain highly activated across diverse tasks and are critical for model performance. By fine-tuning only these neurons rather than updating all parameters, they achieved comparable or better task performance while preserving other capabilities, offering a more efficient approach to model adaptation.

AIBullisharXiv – CS AI · 5d ago7/10

🧠

Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients

Researchers propose a novel direct training algorithm for Spiking Neural Networks that addresses performance gaps with traditional ANNs through circulate-firing neurons, learnable surrogate gradients, and balanced loss functions. The method demonstrates competitive results across datasets and extends effectively to Transformer architectures, potentially advancing energy-efficient neural network applications.

AIBearisharXiv – CS AI · 5d ago7/10

🧠

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Researchers demonstrate that large language model refusal behavior can be detected and exploited through intermediate layer activations before final output generation. A new attack method called Mechanistic AutoDAN leverages this discovery to achieve competitive jailbreak success rates while reducing computational time by up to 72%, raising concerns about LLM safety mechanisms.

AIBullisharXiv – CS AI · May 127/10

🧠

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

LoopVLA introduces a recurrent Vision-Language-Action model architecture that learns when to stop refining representations for robotic control tasks, achieving 45% parameter reduction and 1.7x faster inference while maintaining or improving task performance. The model uses self-supervised learning to estimate representation sufficiency rather than relying on predefined layer depths or heuristic rules.

AIBearisharXiv – CS AI · May 127/10

🧠

Why Do Aligned LLMs Remain Jailbreakable: Refusal-Escape Directions, Operator-Level Sources, and Safety-Utility Trade-off

Researchers identify Refusal-Escape Directions (RED) as mathematical perturbation vectors that explain why aligned LLMs remain vulnerable to jailbreaks. The study reveals structural vulnerabilities arise from fundamental trade-offs between safety mechanisms and model utility, with normalization and residual connections as key exploitable components.

AIBullisharXiv – CS AI · May 117/10

🧠

Reformulating KV Cache Eviction Problem for Long-Context LLM Inference

Researchers introduce LaProx, a novel KV Cache eviction strategy for long-context LLM inference that reformulates the problem from head-wise weight averaging to output-aware layer-wise matrix multiplication. The method achieves 2× accuracy loss reduction under extreme compression while maintaining performance with just 5% of the original KV cache.

AIBearisharXiv – CS AI · May 97/10

🧠

Large Vision-Language Models Get Lost in Attention

Researchers have identified a critical architectural flaw in large vision-language models: attention mechanisms are largely redundant and misallocate computational resources, with random attention weights performing comparably to learned ones. This finding challenges fundamental assumptions about Transformer design and suggests current LVLMs inefficiently process visual information despite their scale.

AIBullisharXiv – CS AI · May 97/10

🧠

Leviathan: Decoupling Input and Output Representations in Language Models

Researchers introduce Leviathan, a Transformer architecture that decouples input embeddings from output projections using learned embedding vectorization (LEV), achieving 9% perplexity reduction at 1.2B parameters with minimal overhead. The approach concentrates improvements on rare tokens while requiring 2.1x fewer training tokens to match baseline performance.

🏢 Perplexity

AIBullisharXiv – CS AI · Mar 127/10

🧠

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design

Researchers have developed a new scaling law for Mixture-of-Experts (MoE) models that optimizes compute allocation between expert and attention layers. The study extends the Chinchilla scaling law by introducing an optimal ratio formula that follows a power-law relationship with total compute and model sparsity.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

Researchers introduce 'opaque serial depth' as a metric to measure how much reasoning large language models can perform without externalizing it through chain of thought processes. The study provides computational bounds for Gemma 3 models and releases open-source tools to calculate these bounds for any neural network architecture.

AIBullisharXiv – CS AI · Mar 57/10

🧠

ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems

Researchers developed ELMUR, a new AI architecture that uses external memory to help robots make better decisions over extremely long time periods. The system achieved 100% success on tasks requiring memory of up to one million steps and nearly doubled performance on robotic manipulation tasks compared to existing methods.

AIBullisharXiv – CS AI · Mar 47/102

🧠

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Researchers propose SUN (Shared Use of Next-token Prediction), a novel approach for multi-LLM serving that enables cross-model sharing of decode execution by decomposing transformers into separate prefill and decode modules. The system achieves up to 2.0x throughput improvement per GPU while maintaining accuracy comparable to full fine-tuning, with a quantized version (QSUN) providing additional 45% speedup.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials

Researchers developed NextHAM, a deep learning method for predicting electronic-structure Hamiltonians of materials, offering significant computational efficiency advantages over traditional DFT methods. The system introduces neural E(3)-symmetry architecture and a new dataset Materials-HAM-SOC with 17,000 material structures spanning 68 elements.

Page 1 of 3Next →