y0news
AnalyticsDigestsSourcesRSSAICrypto
#adversarial-prompts2 articles
2 articles
AIBearishApple Machine Learning ยท 6d ago7/105
๐Ÿง 

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment

Research demonstrates computational challenges in AI alignment, specifically showing that efficient filtering of adversarial prompts and unsafe outputs from large language models may be fundamentally impossible. The study reveals theoretical limitations in separating intelligence from judgment in AI systems, highlighting intractable problems in content filtering approaches.

AIBearisharXiv โ€“ CS AI ยท Feb 277/107
๐Ÿง 

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Researchers developed CC-BOS, a framework that uses classical Chinese text to conduct more effective jailbreak attacks on Large Language Models. The method exploits the conciseness and obscurity of classical Chinese to bypass safety constraints, using bio-inspired optimization techniques to automatically generate adversarial prompts.