Researchers introduce AdaMEM, a test-time adaptive memory framework that enables language agents to dynamically adjust behavior during inference without updating model parameters. The system combines persistent offline trajectory memory with dynamically generated on-the-fly strategy memory, demonstrating 11-13% performance improvements on complex reasoning and web interaction tasks.
AdaMEM addresses a fundamental limitation in current language agent systems: their inability to adapt meaningfully as tasks progress. Traditional agentic architectures retrieve contextual information only at episode start, forcing agents to operate with increasingly stale guidance in long-horizon scenarios. This research proposes a hybrid memory approach that maintains two complementary systems—a long-term repository of raw offline experiences and a dynamic short-term strategy generator that continuously refines decision-making during inference. The framework operates parameter-efficiently, adapting behavior without costly online model updates, while supporting variable computational budgets at inference time.
The technical innovation centers on balancing token efficiency with adaptability, a critical concern for deployed language agents where computational costs scale with inference time. Testing across diverse domains—ALFWorld (household task simulation), WebShop (e-commerce navigation), and HotpotQA (multi-hop reasoning)—reveals consistent 11-13% relative performance gains over static baselines. The accompanying STEP-MFT technique further improves adaptation by training policies to synthesize higher-quality strategies from retrieved experiences, creating a virtuous cycle of continuous improvement.
For the AI systems industry, AdaMEM establishes a new scaling dimension beyond traditional parameter or data scaling. This work validates that adaptive memory mechanisms can enable real-world deployment benefits, where agents encounter distribution shifts and novel scenarios post-launch. The approach supports self-evolution without retraining, reducing operational friction for deployed agent systems. Open-sourcing the codebase accelerates community adoption and empirical validation.
Future research should examine memory scalability to longer horizons, integration with newer model architectures, and applicability to multi-agent collaborative scenarios. The framework's performance-to-compute tradeoff characteristics suggest practical deployment advantages in resource-constrained environments.
- →AdaMEM enables test-time agent adaptation through hybrid long-term trajectory and dynamic short-term strategy memory without parameter updates
- →Achieves 11-13% performance gains on complex reasoning and web interaction benchmarks compared to static memory baselines
- →Framework supports tunable inference-time compute budgets, balancing adaptation quality against token efficiency costs
- →STEP-MFT fine-tuning technique improves strategy synthesis from retrieved experiences, enabling continuous post-deployment agent improvement
- →Open-source implementation enables community research into scaling dimensions beyond parameter and data efficiency for agentic systems