A visual summary of OpenAI’s paper: Learning to Summarize with Human Feedback

Abhishek Ahuja
Oct 19, 2020

An interesting paper on how to fine-tune large-scale Language Models for downstream with focus on preference alignment. The architecture is generic enough to be applicable to most of the downstream tasks like Questions Answering, Translation, etc. A challenge, however, is to build dataset of preferences specific to the downstream task.

Read the paper on arxiv.

--

--