AINeutralOpenAI News ยท Sep 196/106
๐ง
Fine-tuning GPT-2 from human preferences
OpenAI successfully fine-tuned a 774M parameter GPT-2 model using human feedback for tasks like summarization and text continuation. The research revealed challenges where human labelers' preferences didn't align with developers' intentions, with summarization models learning to copy text wholesale rather than generate original summaries.