AINeutralarXiv – CS AI · 6h ago6/10
🧠
TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models
Researchers introduce TrustLDM, a comprehensive benchmark for evaluating the trustworthiness of Language Diffusion Models across safety, privacy, and fairness dimensions. The study reveals that while LDMs perform well with standard prompts, their alignment degrades significantly when malicious post-contexts are attached to masked responses, exposing vulnerabilities across multiple model architectures.