y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#phantom-bench News & Analysis

1 article tagged with #phantom-bench. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv – CS AI · 7h ago7/10
🧠

PhantomBench: Benchmarking the Non-existential Threat of Language Models

Researchers introduced PhantomBench, a large-scale benchmark containing over 60,000 non-existent terms and entities, to evaluate how well language models recognize the limits of their knowledge. Testing 21 models revealed alarming hallucination rates up to 86.7%, demonstrating that even frontier models fail to abstain from generating responses about concepts that don't exist.