You have probably heard about the GPT (generative pre-trained transformer) model or read about it in news articles throughout the last year. It made a pretty big splash at launch. Let’s dig in and start with a question, “What is GPT-2 or even GPT-3?” That is a good question. Let’s get started with a few basics about what exactly a few short learning entails and work toward how it relates to GPT-2 shortly after that. You might be aware that few short learning is a method to take a well built out and defined model and use just a bit of training data to get going by taking a few shots at achieving a favorable outcome.[1] Better ways of saying that exist and have been freely shared online. You can probably find them with a little bit of digging, but I think that sets the stage to start looking at the second part of the topic at hand.
Model extensibility for few shot GPT-2
Model extensibility for few shot GPT-2
Model extensibility for few shot GPT-2
You have probably heard about the GPT (generative pre-trained transformer) model or read about it in news articles throughout the last year. It made a pretty big splash at launch. Let’s dig in and start with a question, “What is GPT-2 or even GPT-3?” That is a good question. Let’s get started with a few basics about what exactly a few short learning entails and work toward how it relates to GPT-2 shortly after that. You might be aware that few short learning is a method to take a well built out and defined model and use just a bit of training data to get going by taking a few shots at achieving a favorable outcome.[1] Better ways of saying that exist and have been freely shared online. You can probably find them with a little bit of digging, but I think that sets the stage to start looking at the second part of the topic at hand.