πŸ”“Open-Source Models

Open-source models, conversely, are often smaller and less capable than their proprietary counterparts, but they offer cost-effectiveness and a higher degree of flexibility for developers.

Open-source models, conversely, are often smaller and less capable than their proprietary counterparts, but they offer cost-effectiveness and a higher degree of flexibility for developers. HuggingFace serves as a popular community hub for hosting and organizing these models.

Examples of open-source models include Stable Diffusion by Stability AI, BLOOM by BigScience, LLaMA or OPT by Meta AI, Flan-T5 by Google, and GPT-J, GPT-Neo, or Pythia by Eleuther AI.

Open LLM Leaderboard

Tests include the AI2 Reasoning Challenge (science questions), Hellaswag (commonsense inference), MMLU (multitask accuracy for elementary mathematics, US history, computer science, law, and other tasks), TruthfulQA (how truthfully the model answers):

  • AI2 Reasoning Challenge (25-shot) - a set of grade-school science questions.

  • HellaSwag (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.

  • MMLU (5-shot) - a test to measure a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.

  • TruthfulQA (0-shot) - a benchmark to measure whether a language model is truthful in generating answers to questions.

RankModelFamilyLicenseAverage ⬆️ARC (25-shot) ⬆️HellaSwag (10-shot) ⬆️MMLU (5-shot) ⬆️TruthfulQA (0-shot) ⬆️
1

Falcon

63.2

61.6

84.4

54.1

52.5

2

Falcon

60.4

61.9

85.3

52.7

41.7

3

LLaMA

Limited, Non-commercial bespoke license

59.8

58.5

82.9

44.3

53.6

4

LLaMA

Limited, Non-commercial bespoke license

58.3

57.8

84.2

48.8

42.3

5

LLaMA

Limited, Non-commercial bespoke license

57.9

56.7

81.4

43.6

49.7

6

LLaMA

Limited, Non-commercial bespoke license

57.4

57.1

82.6

46.1

43.8

7

LLaMA

Limited, Non-commercial bespoke license

57.2

56.1

79.8

44

49.1

8

LLaMA

Limited, Non-commercial bespoke license

57

53.6

79.6

42.7

52

9

LLaMA

Limited, Non-commercial bespoke license

57

57.8

80.8

50.8

38.8

10

LLaMA

Limited, Non-commercial bespoke license

56.9

57.1

82.6

45.7

42.3

11

LLaMA

Limited, Non-commercial bespoke license

55.7

52.5

78.6

41

50.6

12

LLaMA

Limited, Non-commercial bespoke license

53.7

47.4

78

39.6

49.8

13

LLaMA

Limited, Non-commercial bespoke license

53.6

47.8

77.7

39.1

49.7

14

LLaMA

Limited, Non-commercial bespoke license

53.1

45.1

77.9

38.1

51.3

15

LLaMA

Limited, Non-commercial bespoke license

52.6

48

78.6

37.2

46.8

16

LLaMA

Limited, Non-commercial bespoke license

52.4

48.1

76.4

38.8

46.5

17

LLaMA

Limited, Non-commercial bespoke license

52.2

47

75.2

37.5

48.9

18

LLaMA

Limited, Non-commercial bespoke license

51.8

50.8

78.9

37.7

39.9

19

LLaMA

Limited, Non-commercial bespoke license

51.7

51.9

77.6

37.6

39.6

20

51.2

46.8

66.4

50.4

41.3

21

LLaMA

Limited, Non-commercial bespoke license

50.8

48

75.6

36.3

43.3

22

LLaMA

Limited, Non-commercial bespoke license

50.7

45.3

75.5

36.5

45.5

23

LLaMA

Limited, Non-commercial bespoke license

50.1

44.7

73.4

36.9

45.4

24

LLaMA

Limited, Non-commercial bespoke license

49.7

48

77.1

36.1

37.7

25

Falcon

48.8

47.9

78.1

35

34.3

26

Proprietary

48.6

47.7

77.7

35.6

33.4

27

LLaMA

Limited, Non-commercial bespoke license

48.4

45.5

75.2

34.4

38.7

28

Falcon

48.4

45.9

70.8

32.8

44.1

29

LLaMA

Limited, Non-commercial bespoke license

47.6

46.6

75.6

34.2

34.1

30

47.6

46.7

76.2

32.3

35.3

31

46.4

46.8

71.9

32.8

34

32

Apache 2.0

46.2

41.2

64.5

33.3

45.6

33

Apache 2.0

45.9

45.2

73.4

33.3

31.7

34

45.7

44.4

71.3

34

33.2

35

Limited, Non-commercial bespoke license. There is also a version based on Pythia which is Apache licensed.

45.6

45.6

68.5

30.6

37.8

36

Apache 2.0

44.9

41.2

72.3

31.7

34.3

37

44.6

42.6

68.8

31.6

35.5

38

44.4

43.7

69.3

30.2

34.5

39

44.3

41.4

67.6

32.3

36

40

44.2

41.7

68.1

32.7

34.4

41

44

40.5

71.3

30.4

34

42

43.8

40.2

70.7

30.1

34.4

43

42.9

39.9

63.8

31.2

36.7

44

42.2

40.2

64.7

30.6

33.2

45

42.1

39.8

65.2

29.7

33.7

46

42

42.6

49.3

34.1

42.1

47

41.2

35

61.9

30.3

37.8

48

41.1

35.2

57.6

30.8

40.7

49

40

33.3

59.1

29.8

37.9

50

39.8

31.7

49.4

34.4

43.7

51

39.2

33.6

51.2

28.9

43.3

52

38.9

33.8

59.1

28

34.6

53

38.8

33.6

54.7

29.7

37.4

53

38.3

31.9

53.6

27.4

40.2

54

37.8

30.9

52.7

27.5

40.1

55

37.7

29.6

54.6

27.7

38.7

56

36.8

30.3

51.4

26.9

38.5

57

35.9

30

47.7

25.9

40

58

34

25.9

45.6

25.6

38.7

59

33.8

27.2

40.2

27

40.7

60

33.4

26.1

38.5

26.2

42.7

null

33.2

25.5

37.6

26.6

43

null

32.3

27.6

35.6

26.3

39.7

null

32.2

23.6

36.7

27.3

41

null

32

22.6

27.2

27.1

51.2

null

31.6

24.7

30.2

28.9

42.8

null

31.2

23.1

31.5

27.4

42.9

null

31.2

22.6

32.8

26.1

43.4

null

30.4

21.9

31.6

27.5

40.7

null

30.2

22.2

27.5

26.8

44.5

null

29.9

20

26.7

26.7

46.3

null

29.8

22.7

31.1

27.3

38

null

GPT

Source: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Last updated