y0news
AnalyticsDigestsSourcesRSSAICrypto
#threat-response1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 5h ago6/10
๐Ÿง 

Implementing surrogate goals for safer bargaining in LLM-based agents

Researchers developed methods to implement 'surrogate goals' in LLM-based agents to reduce bargaining risks by deflecting threats away from what principals care about. The study tested four approaches (prompting, fine-tuning, scaffolding) and found that scaffolding and fine-tuning methods outperformed simple prompting for implementing desired threat response behaviors.