AINeutralarXiv โ CS AI ยท 19h ago7/10
๐ง
Aligning Compound AI Systems via System-level DPO
Researchers introduce SysDPO, a framework that extends Direct Preference Optimization to align compound AI systems comprising multiple interacting components like LLMs, foundation models, and external tools. The approach addresses challenges in optimizing complex AI systems by modeling them as Directed Acyclic Graphs and enabling system-level alignment through two variants: SysDPO-Direct and SysDPO-Sampling.