M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models
Researchers introduce M2A, a novel model merging paradigm that combines mathematical and agentic reasoning in large language models without retraining. The approach improves a Qwen3-8B model's software engineering benchmark performance from 44.0% to 51.2% by strategically injecting mathematical reasoning capabilities along directions that preserve agent behavior.
M2A addresses a fundamental technical challenge in large language model development: the incompatibility between mathematical reasoning, which solves closed-world problems in single responses, and agentic reasoning, which requires iterative interaction with external environments. Traditional multi-task learning struggles to balance these competing demands, resulting in unstable behavior and modest performance gains. The proposed solution operates entirely in parameter space, identifying critical feature subspaces for agent behavior and merging mathematical task vectors only along null space directions—directions orthogonal to agent capabilities. This approach elegantly avoids the computational overhead of retraining while exposing a single coefficient to control reasoning depth, offering practical modularity for practitioners.
The work represents a meaningful advance in efficient model adaptation techniques. Rather than relying on supervised fine-tuning or reinforcement learning, both computationally expensive approaches, M2A demonstrates that strategic parameter-space manipulation can achieve significant performance improvements. In real-world testing using a Qwen3-8B model on SWE-Bench Verified, a challenging software engineering benchmark, the method achieved a 7.2 percentage point improvement—substantial gains in a domain where incremental progress typically requires intensive retraining.
For developers and organizations building AI systems, M2A's efficiency gains carry practical implications. The ability to enhance model capabilities without full retraining reduces computational costs and iteration time, democratizing access to capability improvements across different model scales. As LLMs increasingly power autonomous agents in complex domains like software development, healthcare, and financial analysis, techniques that reliably combine mathematical rigor with environmental interaction become strategically valuable. The open-source release enables broader community exploration and refinement of the approach.
- →M2A synergizes mathematical and agentic reasoning through parameter-space model merging without gradient updates or retraining
- →Applied to Qwen3-8B, the method improved SWE-Bench Verified performance by 7.2 percentage points on software engineering tasks
- →The approach uses null-space injection to add mathematical reasoning capabilities while preserving existing agent behavior
- →A single merging coefficient enables practitioners to control reasoning depth without model retraining
- →The technique addresses a fundamental misalignment between mathematical and agentic reasoning patterns in current LLMs