AIBullisharXiv – CS AI · 18h ago6/10
🧠
Minibatch Selection via Partition Matroid Constrained Gradient Matching
Researchers introduce PartitionSel, a minibatch selection algorithm that optimizes training of large language models on diverse datasets by balancing convergence speed with domain coverage. The method uses partition-matroid constraints and gradient-matching utilities to reduce redundancy across domains while maintaining computational efficiency, demonstrating improvements over existing approaches on Qwen2.5 and Llama-3 benchmarks.
🧠 Llama