AINeutralarXiv – CS AI · 3h ago6/10
🧠
CiteCheck: Retrieval-Grounded Detection of LLM Citation Hallucinations in Scientific Text
Researchers introduce CiteCheck, a hybrid framework that detects when large language models fabricate or corrupt scientific citations by combining scholarly database retrieval with structured LLM verification. The system achieves 88.7% macro-F1 on a new 982-citation physics benchmark, outperforming GPT, Claude, and Gemini, addressing a critical reliability problem as LLMs become integrated into scientific research workflows.
🧠 Claude🧠 Gemini