GPT-NeoX-20B
Family: GPT
Pretraining Architecture: Decoder
Pretraining Task: LM
Extension: Similar to GPT-3 with rotary encoders instead of positional, parallel attention and feed forward layers, different initialization, and all dense layers instead of alternate dense/sparse
Application: same as GPT-3
Date (of first known publication): 04/2022
Num. Params: 20B
Corpus: Pile β 840 GB open source text dataset that combines 22 pre existing datasets
License: Open, Apache-2.0
Lab: EleutherAI
Last updated