#llm-scaling News & Analysis

2 articles tagged with #llm-scaling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · Mar 267/10

🧠

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Researchers present Memory Sparse Attention (MSA), a new AI framework that enables language models to process up to 100 million tokens with linear complexity and less than 9% performance degradation. The technology addresses current limitations in long-term memory processing and can run 100M-token inference on just 2 GPUs, potentially revolutionizing applications like large-corpus analysis and long-history reasoning.

AIBullisharXiv – CS AI · Mar 167/10

🧠

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Research shows that large language models' performance on short tasks may underestimate their capabilities, as small improvements in single-step accuracy lead to exponential gains in handling longer tasks. The study reveals that larger models excel at execution over many steps, though they suffer from 'self-conditioning' where previous errors increase the likelihood of future mistakes, which can be mitigated through 'thinking' mechanisms.