AIBullisharXiv โ CS AI ยท 3d ago7/10
๐ง
Reinforcing Numerical Reasoning in LLMs for Tabular Prediction via Structural Priors
Researchers propose PRPO (Permutation Relative Policy Optimization), a reinforcement learning framework that enhances large language models' numerical reasoning capabilities for tabular data prediction. The method achieves performance comparable to supervised baselines while excelling in zero-shot scenarios, with an 8B parameter model outperforming much larger models by up to 53.17%.