AIBullisharXiv โ CS AI ยท 5h ago
๐ง
Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning
Researchers developed a new AI training method using knowledge graphs as reward models to improve compositional reasoning in specialized domains. The approach enables smaller 14B parameter models to outperform much larger frontier systems like GPT-5.2 and Gemini 3 Pro on complex multi-hop reasoning tasks in medicine.
๐ง Gemini