AINeutralarXiv โ CS AI ยท 16h ago6/10
๐ง
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
Researchers have developed ConStory-Bench, a new benchmark to evaluate consistency errors in long-form story generation by Large Language Models. The study reveals that LLMs frequently contradict their own established facts and character traits when generating lengthy narratives, with errors most commonly occurring in factual and temporal dimensions around the middle of stories.