AIBullisharXiv โ CS AI ยท 5h ago1
๐ง
Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification
Researchers developed GLEAN, a new AI verification framework that improves reliability of LLM-powered agents in high-stakes decisions like clinical diagnosis. The system uses expert guidelines and Bayesian logistic regression to better verify AI agent decisions, showing 12% improvement in accuracy and 50% better calibration in medical diagnosis tests.