AIBearisharXiv – CS AI · 15h ago7/10
🧠
The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance
Researchers propose the 'Cognitive Trojan Horse' hypothesis, arguing that large language models may bypass human epistemic vigilance not through deception but through possessing 'honest non-signals'—characteristics like fluency and helpfulness that appear trustworthy in humans but are computationally cheap for AI systems. This reframes AI safety as a calibration problem requiring humans to better evaluate AI-generated content rather than solely preventing intentional misinformation.