AIBullisharXiv โ CS AI ยท 1d ago6/10
๐ง
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
Researchers propose Future Summary Prediction (FSP), a new pretraining method for large language models that predicts compact representations of long-term future text sequences. FSP outperforms traditional next-token prediction and multi-token prediction methods in math, reasoning, and coding benchmarks when tested on 3B and 8B parameter models.