Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning
Researchers introduce TaskPGM, a framework that optimizes how training data is distributed across multiple tasks when fine-tuning large language models by modeling task relationships through an energy-based probabilistic approach. The method balances task coverage against redundancy, demonstrating improvements over conventional uniform or size-proportional sampling strategies across multiple model families and evaluation benchmarks.
TaskPGM addresses a fundamental challenge in modern machine learning: efficiently allocating finite training budgets across heterogeneous task sets. Current industry practice relies on simple heuristics that ignore how tasks interact, potentially duplicating efforts or missing complementary learning opportunities. The framework models tasks as nodes in a probabilistic graphical model, quantifying both individual task utility and inter-task relationships through behavioral divergences derived from single-task fine-tuned models. This approach enables the discovery of task mixtures that maximize learning efficiency without manual intervention.
The research builds on established principles in information theory and submodular optimization, applying these mathematical foundations to a practical problem facing LLM practitioners. As models grow larger and training becomes increasingly expensive, the ability to strategically sample from diverse datasets becomes economically significant. The weak submodularity property provides theoretical approximation guarantees, distinguishing this work from purely empirical approaches.
For AI development teams, TaskPGM offers immediate practical value by reducing wasted computational resources during fine-tuning phases. The interpretable task interaction structures the framework produces also provide insights into which capabilities transfer between domains. Across tested models like LLaMA-7B and Qwen2-7B, the approach consistently outperforms baseline strategies. Organizations conducting supervised fine-tuning on multiple task distributions could benefit from adopting similar probabilistic selection mechanisms. The work signals a broader trend toward more principled, mathematically-grounded approaches to hyperparameter and data mixture optimization in large-scale AI training.
- βTaskPGM uses energy-based probabilistic modeling to optimize task mixture selection during LLM fine-tuning rather than relying on fixed heuristics.
- βThe framework models inter-task relationships using behavioral divergences from predictive distributions, capturing complementarity and redundancy.
- βThe resulting set function exhibits weak submodularity, enabling theoretical approximation guarantees for practical discrete selection variants.
- βTesting across multiple model families shows consistent improvements over uniform and size-proportional sampling baselines.
- βTask interaction structure from TaskPGM provides interpretability about which domains transfer knowledge effectively.