EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering
Researchers present EASE-TTT, a novel framework combining within-context retrieval with test-time adaptation to improve long-context question answering in smaller language models. The method identifies evidence chunks and converts them into soft attention supervision targets, allowing models to focus on relevant information while processing the full context, outperforming existing retrieval-only and generic adaptation baselines.
The research addresses a persistent limitation in language model performance: smaller models struggle with long-context question answering even when relevant evidence exists in the input. Traditional approaches either expose evidence chunks at the input level without adapting model behavior, or apply generic self-supervised training objectives that fail to distinguish which context positions actually support the correct answer. EASE-TTT bridges this gap by creating a hybrid framework that leverages evidence localization to guide the model's attention mechanisms during inference.
This advancement builds on two parallel research trends: the push toward more efficient inference through test-time training adaptation, and the recognition that retrieval-augmented methods improve performance on knowledge-intensive tasks. Previous work in query-only test-time training (qTTT) demonstrated efficiency gains but lacked semantic grounding; EASE-TTT grounds adaptation in actual evidence positions rather than generic span-level objectives.
The framework's practical implications extend across several domains. For developers deploying smaller language models in resource-constrained environments, EASE-TTT offers a computationally efficient alternative to retrieving and processing context truncation. The method generates answers from the full original context rather than replacing it, preserving contextual nuance while improving accuracy. Across six LongBench QA benchmarks with three different decoder-only models, EASE-TTT demonstrates consistent improvements, suggesting broad applicability.
Future development should explore whether evidence-aligned adaptation transfers to other long-context tasks beyond question answering, and whether the framework scales efficiently with model size. The approach may also inspire similar hybrid methods in other domains requiring selective attention over noisy or extensive information.
- βEASE-TTT combines evidence retrieval with test-time training adaptation to improve accuracy in long-context QA for smaller language models
- βThe framework converts evidence chunks into soft attention supervision targets that guide model adaptation during inference
- βResults across six LongBench tasks show stronger performance than full-context inference, retrieval-only baselines, and generic test-time training methods
- βThe approach processes the full original context rather than truncated retrieved chunks, preserving contextual information
- βEvidence-aligned adaptation addresses the semantic grounding gap in previous generic test-time training methods