AIBearisharXiv – CS AI · May 127/10
🧠Researchers introduce EditRisk-Bench, a new benchmark for evaluating safety vulnerabilities in large language models when their knowledge is maliciously edited. The study demonstrates that adversaries can inject false or harmful information that corrupts downstream reasoning while remaining difficult to detect, revealing critical security gaps in knowledge-intensive AI systems.
AINeutralarXiv – CS AI · May 77/10
🧠Researchers present an automated pipeline for auditing behavioral changes in large language models when interventions are applied. The method generates human-readable hypotheses about model differences and validates them statistically, successfully identifying both intended and unexpected side-effects across real-world interventions like knowledge editing and unlearning.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce SCAN, a new framework for editing Large Language Models that prevents catastrophic forgetting during sequential knowledge updates. The method uses sparse circuit manipulation instead of dense parameter changes, maintaining model performance even after 3,000 sequential edits across major models like Gemma2, Qwen3, and Llama3.1.
🧠 Llama
AINeutralarXiv – CS AI · Jun 106/10
🧠Researchers introduce a new benchmark for evaluating knowledge editing in Large Language Models that tests logical consequences of edits, not just direct fact insertion. Current methods like ROME and FT show up to 24% performance gaps between edited facts and their logical implications, revealing a critical weakness in how LLMs handle knowledge consistency.
AINeutralarXiv – CS AI · Jun 46/10
🧠Researchers introduce ZeroUnlearn, a novel machine unlearning framework that efficiently removes sensitive information from large language models through knowledge re-mapping and representational orthogonality, rather than expensive retraining. The method preserves overall model utility while selectively unlearning harmful data in few-shot settings, addressing critical privacy and safety concerns in LLMs.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers propose Joint Neighborhood Optimization (JNO), a new framework for knowledge editing in large language models that simultaneously manages desired information propagation and prevents unintended disruption to related facts. The method uses Pressure-Aware Coordination to jointly optimize coupled constraints and achieves 7% improvement in both propagation and preservation metrics across different model architectures.
$XRP
AINeutralarXiv – CS AI · May 296/10
🧠Researchers propose LDKE, a new framework for editing knowledge in Multimodal Large Language Models that addresses two critical failure modes: causal misalignment (edits confined to specific samples) and feature entanglement (unintended alterations to related information). The method uses localized layer identification and input disentanglement to enable precise, generalized edits while preserving unrelated knowledge.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers present CODE, a novel approach to knowledge editing in large language models that replaces fact overwriting with causal reasoning. By embedding causal narratives and on-policy distillation into model parameters, CODE reduces self-refutation rates from 95.6% to 1.8%, enabling LLMs to evolve knowledge coherently rather than storing isolated facts.
AIBullisharXiv – CS AI · Mar 166/10
🧠Researchers propose MetaKE, a new framework for knowledge editing in Large Language Models that addresses the 'Semantic-Execution Disconnect' through bi-level optimization. The method treats edit targets as learnable parameters and uses a Structural Gradient Proxy to align edits with the model's feasible manifold, showing significant improvements over existing approaches.
AINeutralarXiv – CS AI · Mar 175/10
🧠Researchers introduce SAKE, the first benchmark for editing auditory attribute knowledge in large audio-language models without requiring full retraining. The study reveals significant limitations in current editing methods, particularly with auditory generalization and sequential editing, while finding that fine-tuning modality connectors offers better performance than editing LLM backbones directly.