AIBullisharXiv โ CS AI ยท 4h ago6/10
๐ง
Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training
Researchers propose Cycle-Consistent Search (CCS), a novel framework for training search agents using reinforcement learning without requiring gold-standard labeled data. The method leverages question reconstructability as a reward signal, using information bottlenecks to ensure agents learn from genuine search quality rather than surface-level linguistic patterns.