AIBearisharXiv – CS AI · 10h ago7/10
🧠
Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing
Researchers introduce EditRisk-Bench, a new benchmark for evaluating safety vulnerabilities in large language models when their knowledge is maliciously edited. The study demonstrates that adversaries can inject false or harmful information that corrupts downstream reasoning while remaining difficult to detect, revealing critical security gaps in knowledge-intensive AI systems.