GPT-NeoX-20B

  • Family: GPT

  • Pretraining Architecture: Decoder

  • Pretraining Task: LM

  • Extension: Similar to GPT-3 with rotary encoders instead of positional, parallel attention and feed forward layers, different initialization, and all dense layers instead of alternate dense/sparse

  • Application: same as GPT-3

  • Date (of first known publication): 04/2022

  • Num. Params: 20B

  • Corpus: Pile β€” 840 GB open source text dataset that combines 22 pre existing datasets

  • License: Open, Apache-2.0

  • Lab: EleutherAI

Last updated