BLOOM

  • Family: GPT

  • Pretraining Architecture: Decoder

  • Pretraining Task: LM

  • Extension: Main difference to GPT-3 is that it uses full attention instead of sparse attention

  • Application: Same as GPT-3

  • Date (of first known publication): 07/2022

  • Num. Params:176B

  • Corpus: 366B tokens (1.5 TB of text data) multilingual dataset

  • Lab: Big Science/Huggingface

  • License: Open, but need to follow restrictions in Attachment A, BigScience RAIL License v1.0

Last updated