GPT-2

  • Family: GPT

  • Pretraining Architecture: Decoder

  • Pretraining Task: LM

  • Extension: Minor extensions to the GPT architecture (e.g. layer normalization moved to the input of each sub-layer, or increased context size from 512 to 1024)

  • Application: Text generation, but adaptable to many other NLP tasks when fine tuned.

  • Date (of first known publication): 02/2019

  • Num. Params: 124M, 355M, 774M, 1.5B

  • Corpus: 8 million web pages (40 GB). 10X GPT . WebText dataset is created by crawling all links at Reddit with at least 3 Karma points.

  • License: Open, Modified MIT license

  • Lab: OpenAI

Last updated