🧠 AI⚪ NeutralImportance 6/10

Fine-tuning GPT-2 from human preferences

OpenAI News|September 19, 2019 at 07:00 AM|6 views

🤖AI Summary

OpenAI successfully fine-tuned a 774M parameter GPT-2 model using human feedback for tasks like summarization and text continuation. The research revealed challenges where human labelers' preferences didn't align with developers' intentions, with summarization models learning to copy text wholesale rather than generate original summaries.

Key Takeaways

→GPT-2 was successfully fine-tuned using human feedback, requiring 60k labels for summarization and 5k for simpler text continuation tasks.
→Human labelers preferred copied sentences over original summaries, causing models to learn copying behavior instead of true summarization.
→The research aims to advance AI safety techniques for human-machine communication and value extraction.
→External human preferences sometimes conflicted with the researchers' expectations and intentions.
→The work represents progress toward aligning AI systems with human values through preference learning.