#language-modeling News & Analysis

8 articles tagged with #language-modeling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · Mar 57/10

🧠

Quantum-Inspired Self-Attention in a Large Language Model

Researchers developed a quantum-inspired self-attention (QISA) mechanism and integrated it into GPT-1's language modeling pipeline, marking the first such integration in autoregressive language models. The QISA mechanism demonstrated significant performance improvements over standard self-attention, achieving 15.5x better character error rate and 13x better cross-entropy loss with only 2.6x longer inference time.

AIBullisharXiv – CS AI · Mar 37/105

🧠

Long-Context Generalization with Sparse Attention

Researchers introduce ASEntmax, a new attention mechanism for transformer models that uses sparse attention with learnable temperature parameters. This approach significantly outperforms traditional softmax attention, achieving up to 1000x length extrapolation on synthetic tasks and better long-context performance in language modeling.

AINeutralarXiv – CS AI · May 116/10

🧠

Adaptive Memory Decay for Log-Linear Attention

Researchers propose a modification to log-linear attention mechanisms that learns adaptive memory decay parameters directly from input data rather than using fixed values. This approach maintains logarithmic memory growth and log-linear computational complexity while improving long-range context retention, particularly in language modeling and selective recall tasks.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Training Transformers in Cosine Coefficient Space

Researchers developed a new method to train transformer neural networks using discrete cosine transform (DCT) coefficients, achieving the same performance while using only 52% of the parameters. The technique requires no architectural changes and simply replaces standard linear layers with spectral layers that store DCT coefficients instead of full weight matrices.

🏢 Perplexity

AINeutralarXiv – CS AI · Mar 26/1011

🧠

Memory Caching: RNNs with Growing Memory

Researchers introduce Memory Caching (MC), a technique that enhances recurrent neural networks by allowing their memory capacity to grow with sequence length, bridging the gap between fixed-memory RNNs and growing-memory Transformers. The approach offers four variants and shows competitive performance with Transformers on language modeling and long-context tasks while maintaining better computational efficiency.

AIBullisharXiv – CS AI · Feb 275/106

🧠

Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies

Researchers developed a learned scheduler for masked diffusion models (MDMs) in language modeling that outperforms traditional rule-based approaches. The new method uses a KL-regularized Markov decision process framework and demonstrated significant improvements, including 20.1% gains over random scheduling and 11.2% over max-confidence approaches on benchmark tests.

AINeutralarXiv – CS AI · Apr 64/10

🧠

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

Researchers investigated lower bounds for language modeling using semantic structures, finding that binary vector representations of semantic structure can be dramatically reduced in dimensionality while maintaining effectiveness. The study establishes that prediction quality bounds require analysis of signal-noise distributions rather than single scores alone.

AINeutralHugging Face Blog · Jul 31/105

🧠

The Reformer - Pushing the limits of language modeling

The article title references 'The Reformer' and language modeling limits, but no article body content was provided for analysis. Without the actual article content, a comprehensive analysis of developments in AI language modeling cannot be performed.