y0news
AnalyticsDigestsSourcesRSSAICrypto
#detection-methods1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 4d ago6/103
๐Ÿง 

FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

Researchers introduce FaithCoT-Bench, the first comprehensive benchmark for detecting unfaithful Chain-of-Thought reasoning in large language models. The benchmark includes over 1,000 expert-annotated trajectories across four domains and evaluates eleven detection methods, revealing significant challenges in identifying unreliable AI reasoning processes.