y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#long-context News & Analysis

20 articles tagged with #long-context. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

20 articles
AIBullisharXiv – CS AI · Mar 267/10
🧠

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Researchers present Memory Sparse Attention (MSA), a new AI framework that enables language models to process up to 100 million tokens with linear complexity and less than 9% performance degradation. The technology addresses current limitations in long-term memory processing and can run 100M-token inference on just 2 GPUs, potentially revolutionizing applications like large-corpus analysis and long-history reasoning.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

IceCache is a new memory management technique for large language models that reduces KV cache memory consumption by 75% while maintaining 99% accuracy on long-sequence tasks. The method combines semantic token clustering with PagedAttention to intelligently offload cache data between GPU and CPU, addressing a critical bottleneck in LLM inference on resource-constrained hardware.

AIBullisharXiv – CS AI · 3d ago7/10
🧠

CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference

Researchers introduce CSAttention, a training-free sparse attention method that accelerates LLM inference by 4.6x for long-context applications. The technique optimizes the offline-prefill/online-decode workflow by precomputing query-centric lookup tables, enabling faster token generation without sacrificing accuracy even at 95% sparsity levels.

AIBullisharXiv – CS AI · Mar 277/10
🧠

SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing

Researchers propose SWAA (Sliding Window Attention Adaptation), a toolkit that enables efficient long-context processing in large language models by adapting full attention models to sliding window attention without expensive retraining. The solution achieves 30-100% speedups for long context inference while maintaining acceptable performance quality through four core strategies that address training-inference mismatches.

AIBullisharXiv – CS AI · Mar 117/10
🧠

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

Researchers propose ARKV, a new framework for managing memory in large language models that reduces KV cache memory usage by 4x while preserving 97% of baseline accuracy. The adaptive system dynamically allocates precision levels to cached tokens based on attention patterns, enabling more efficient long-context inference without requiring model retraining.

AIBullisharXiv – CS AI · Mar 37/105
🧠

Long-Context Generalization with Sparse Attention

Researchers introduce ASEntmax, a new attention mechanism for transformer models that uses sparse attention with learnable temperature parameters. This approach significantly outperforms traditional softmax attention, achieving up to 1000x length extrapolation on synthetic tasks and better long-context performance in language modeling.

AIBullisharXiv – CS AI · Mar 37/102
🧠

RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers

Researchers introduce RMAAT (Recurrent Memory Augmented Astromorphic Transformer), a new architecture inspired by brain astrocyte cells that addresses the quadratic complexity problem in Transformer models for long sequences. The system uses recurrent memory tokens and adaptive compression to achieve linear complexity while maintaining competitive accuracy on benchmark tests.

AIBullisharXiv – CS AI · Mar 37/102
🧠

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

MiniCPM-SALA introduces a 9B-parameter hybrid language model architecture that combines sparse and linear attention mechanisms to handle ultra-long contexts up to 1M tokens. The model achieves 3.5x faster inference than full-attention models while reducing training costs by 75% through a continual training framework that transforms existing Transformer models.

AIBullisharXiv – CS AI · Feb 277/102
🧠

S2O: Early Stopping for Sparse Attention via Online Permutation

Researchers introduce S2O, a new sparse attention method that uses online permutation and early stopping to dramatically improve AI model efficiency. The technique achieves 3.81x end-to-end speedup on Llama-3.1-8B with 128K context while maintaining accuracy.

AIBullishOpenAI News · Dec 117/104
🧠

Introducing GPT-5.2

OpenAI has announced GPT-5.2, their most advanced frontier AI model designed for professional applications. The model features enhanced reasoning capabilities, long-context understanding, coding abilities, and vision functionality, available through ChatGPT and OpenAI API for improved agentic workflows.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

Researchers introduce RecaLLM, a post-trained language model that addresses the 'lost-in-thought' phenomenon where retrieval performance degrades during extended reasoning chains. The model interleaves explicit in-context retrieval with reasoning steps and achieves strong performance on long-context benchmarks using training data significantly shorter than existing approaches.

AIBullisharXiv – CS AI · Mar 126/10
🧠

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

Researchers have developed LookaheadKV, a new framework that significantly improves memory efficiency in large language models by intelligently evicting less important cached data. The method achieves superior accuracy while reducing computational costs by up to 14.5x compared to existing approaches, making long-context AI tasks more practical.

AIBullisharXiv – CS AI · Mar 36/106
🧠

Stateful Token Reduction for Long-Video Hybrid VLMs

Researchers developed a new token reduction method for hybrid vision-language models that process long videos, achieving 3.8-4.2x speedup while retaining only 25% of visual tokens. The approach uses progressive reduction and unified scoring for both attention and Mamba blocks, maintaining near-baseline accuracy on long-context video benchmarks.

$NEAR
AIBearisharXiv – CS AI · Mar 36/104
🧠

Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

Researchers introduced SciTrek, a new benchmark for testing large language models' ability to perform numerical reasoning across long scientific documents. The benchmark reveals significant challenges for current LLMs, with the best model achieving only 46.5% accuracy at 128K tokens, and performance declining as context length increases.

$COMP
AIBullisharXiv – CS AI · Mar 36/103
🧠

Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

Researchers introduce ReMemR1, a new approach to improve large language models' ability to handle long-context question answering by integrating memory retrieval into the memory update process. The system enables non-linear reasoning through selective callback of historical memories and uses multi-level reward design to strengthen training.

AIBullisharXiv – CS AI · Mar 26/1012
🧠

Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents

Researchers developed a new discriminative AI model based on Qwen3-0.6B that can efficiently segment ultra-long documents up to 13k tokens for better information retrieval. The model achieves superior performance compared to generative alternatives while delivering two orders of magnitude faster inference on the Wikipedia WIKI-727K dataset.

AIBullishHugging Face Blog · Jul 86/105
🧠

SmolLM3: smol, multilingual, long-context reasoner

SmolLM3 represents a new compact language model that combines multilingual capabilities with long-context reasoning abilities. The model appears to be designed for efficiency while maintaining strong performance across multiple languages and complex reasoning tasks.

AINeutralHugging Face Blog · Apr 166/108
🧠

Introducing HELMET: Holistically Evaluating Long-context Language Models

HELMET is a new holistic evaluation framework for assessing long-context language models across multiple dimensions and use cases. The framework aims to provide comprehensive benchmarking capabilities for AI models that can process extended text sequences.

AINeutralHugging Face Blog · Jan 234/105
🧠

Mastering Long Contexts in LLMs with KVPress

The article title suggests coverage of KVPress, a technique for managing long contexts in Large Language Models. However, the article body appears to be empty or unavailable, preventing detailed analysis of the content.