🔓Open-Source Models

Open-source models, conversely, are often smaller and less capable than their proprietary counterparts, but they offer cost-effectiveness and a higher degree of flexibility for developers.

Open-source models, conversely, are often smaller and less capable than their proprietary counterparts, but they offer cost-effectiveness and a higher degree of flexibility for developers. HuggingFace serves as a popular community hub for hosting and organizing these models.

Examples of open-source models include Stable Diffusion by Stability AI, BLOOM by BigScience, LLaMA or OPT by Meta AI, Flan-T5 by Google, and GPT-J, GPT-Neo, or Pythia by Eleuther AI.

Open LLM Leaderboard

Tests include the AI2 Reasoning Challenge (science questions), Hellaswag (commonsense inference), MMLU (multitask accuracy for elementary mathematics, US history, computer science, law, and other tasks), TruthfulQA (how truthfully the model answers):

AI2 Reasoning Challenge (25-shot) - a set of grade-school science questions.
HellaSwag (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
MMLU (5-shot) - a test to measure a text model’s multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
TruthfulQA (0-shot) - a benchmark to measure whether a language model is truthful in generating answers to questions.

Rank	Model	Family	License	Average ⬆️	ARC (25-shot) ⬆️	HellaSwag (10-shot) ⬆️	MMLU (5-shot) ⬆️	TruthfulQA (0-shot) ⬆️
1	🥇 tiiuae/falcon-40b-instruct	Falcon	TII Falcon LLM License	63.2	61.6	84.4	54.1	52.5
2	🥈 tiiuae/falcon-40b	Falcon	TII Falcon LLM License	60.4	61.9	85.3	52.7	41.7
3	🥉 ausboss/llama-30b-supercot	LLaMA	Limited, Non-commercial bespoke license	59.8	58.5	82.9	44.3	53.6
4	llama-65b	LLaMA	Limited, Non-commercial bespoke license	58.3	57.8	84.2	48.8	42.3
5	MetaIX/GPT4-X-Alpasta-30b	LLaMA	Limited, Non-commercial bespoke license	57.9	56.7	81.4	43.6	49.7
6	digitous/Alpacino30b	LLaMA	Limited, Non-commercial bespoke license	57.4	57.1	82.6	46.1	43.8
7	Aeala/GPT4-x-AlpacaDente2-30b	LLaMA	Limited, Non-commercial bespoke license	57.2	56.1	79.8	44	49.1
8	TheBloke/Wizard-Vicuna-13B-Uncensored-HF	LLaMA	Limited, Non-commercial bespoke license	57	53.6	79.6	42.7	52
9	TheBloke/dromedary-65b-lora-HF	LLaMA	Limited, Non-commercial bespoke license	57	57.8	80.8	50.8	38.8
10	llama-30b	LLaMA	Limited, Non-commercial bespoke license	56.9	57.1	82.6	45.7	42.3
11	openaccess-ai-collective/wizard-mega-13b	LLaMA	Limited, Non-commercial bespoke license	55.7	52.5	78.6	41	50.6
12	TheBloke/vicuna-13B-1.1-HF	LLaMA	Limited, Non-commercial bespoke license	53.7	47.4	78	39.6	49.8
13	chavinlo/gpt4-x-alpaca	LLaMA	Limited, Non-commercial bespoke license	53.6	47.8	77.7	39.1	49.7
14	eachadea/vicuna-13b	LLaMA	Limited, Non-commercial bespoke license	53.1	45.1	77.9	38.1	51.3
15	medalpaca/medalpaca-13b	LLaMA	Limited, Non-commercial bespoke license	52.6	48	78.6	37.2	46.8
16	stable-vicuna-13b	LLaMA	Limited, Non-commercial bespoke license	52.4	48.1	76.4	38.8	46.5
17	eachadea/vicuna-7b-1.1	LLaMA	Limited, Non-commercial bespoke license	52.2	47	75.2	37.5	48.9
18	llama-13b	LLaMA	Limited, Non-commercial bespoke license	51.8	50.8	78.9	37.7	39.9
19	alpaca-13b	LLaMA	Limited, Non-commercial bespoke license	51.7	51.9	77.6	37.6	39.6
20	facebook/galactica-120b			51.2	46.8	66.4	50.4	41.3
21	jondurbin/airoboros-7b	LLaMA	Limited, Non-commercial bespoke license	50.8	48	75.6	36.3	43.3
22	AlekseyKorshuk/vicuna-7b	LLaMA	Limited, Non-commercial bespoke license	50.7	45.3	75.5	36.5	45.5
23	TheBloke/wizardLM-7B-HF	LLaMA	Limited, Non-commercial bespoke license	50.1	44.7	73.4	36.9	45.4
24	wordcab/llama-natural-instructions-13b	LLaMA	Limited, Non-commercial bespoke license	49.7	48	77.1	36.1	37.7
25	tiiuae/falcon-7b	Falcon	TII Falcon LLM License	48.8	47.9	78.1	35	34.3
26	mosaicml/mpt-7b	MosaicML	Proprietary	48.6	47.7	77.7	35.6	33.4
27	chainyo/alpaca-lora-7b	LLaMA	Limited, Non-commercial bespoke license	48.4	45.5	75.2	34.4	38.7
28	tiiuae/falcon-7b-instruct	Falcon	TII Falcon LLM License	48.4	45.9	70.8	32.8	44.1
29	llama-7b	LLaMA	Limited, Non-commercial bespoke license	47.6	46.6	75.6	34.2	34.1
30	facebook/opt-66b			47.6	46.7	76.2	32.3	35.3
31	Salesforce/codegen-16B-nl			46.4	46.8	71.9	32.8	34
32	nomic-ai/gpt4all-j		Apache 2.0	46.2	41.2	64.5	33.3	45.6
33	EleutherAI/gpt-neox-20b		Apache 2.0	45.9	45.2	73.4	33.3	31.7
34	togethercomputer/RedPajama-INCITE-Base-7B-v0.1			45.7	44.4	71.3	34	33.2
35	OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5		Limited, Non-commercial bespoke license. There is also a version based on Pythia which is Apache licensed.	45.6	45.6	68.5	30.6	37.8
36	databricks/dolly-v2-12b		Apache 2.0	44.9	41.2	72.3	31.7	34.3
37	Pirr/pythia-13b-deduped-green_devil			44.6	42.6	68.8	31.6	35.5
38	databricks/dolly-v2-7b			44.4	43.7	69.3	30.2	34.5
39	EleutherAI/gpt-j-6b			44.3	41.4	67.6	32.3	36
40	digitous/Javalion-R			44.2	41.7	68.1	32.7	34.4
41	facebook/opt-13b			44	40.5	71.3	30.4	34
42	KoboldAI/OPT-13B-Nerybus-Mix			43.8	40.2	70.7	30.1	34.4
43	amazon/LightGPT			42.9	39.9	63.8	31.2	36.7
44	togethercomputer/RedPajama-INCITE-Base-3B-v1			42.2	40.2	64.7	30.6	33.2
45	databricks/dolly-v2-3b			42.1	39.8	65.2	29.7	33.7
46	GeorgiaTechResearchInstitute/galactica-6.7b-evol-instruct-70k			42	42.6	49.3	34.1	42.1
47	openlm-research/open_llama_7b_700bt_preview			41.2	35	61.9	30.3	37.8
48	Writer/camel-5b-hf			41.1	35.2	57.6	30.8	40.7
49	openlm-research/open_llama_7b_400bt_preview			40	33.3	59.1	29.8	37.9
50	HuggingFaceH4/starchat-alpha			39.8	31.7	49.4	34.4	43.7
51	Salesforce/codegen-16B-multi			39.2	33.6	51.2	28.9	43.3
52	pythainlp/wangchanglm-7.5B-sft-enth			38.9	33.8	59.1	28	34.6
53	openlm-research/open_llama_3b_350bt_preview			38.8	33.6	54.7	29.7	37.4
53	stabilityai/stablelm-tuned-alpha-7b			38.3	31.9	53.6	27.4	40.2
54	hakurei/lotus-12B			37.8	30.9	52.7	27.5	40.1
55	facebook/opt-1.3b			37.7	29.6	54.6	27.7	38.7
56	gpt2-xl			36.8	30.3	51.4	26.9	38.5
57	aisquared/dlite-v2-774m			35.9	30	47.7	25.9	40
58	gpt2-large			34	25.9	45.6	25.6	38.7
59	gpt2-medium			33.8	27.2	40.2	27	40.7
60	cerebras/Cerebras-GPT-1.3B			33.4	26.1	38.5	26.2	42.7
null	xhyi/PT_GPTNEO350_ATG			33.2	25.5	37.6	26.6	43
null	beomi/KoAlpaca-Polyglot-5.8B			32.3	27.6	35.6	26.3	39.7
null	facebook/opt-350m			32.2	23.6	36.7	27.3	41
null	microsoft/CodeGPT-small-py			32	22.6	27.2	27.1	51.2
null	MBZUAI/lamini-neo-125m			31.6	24.7	30.2	28.9	42.8
null	facebook/opt-125m			31.2	23.1	31.5	27.4	42.9
null	ai-forever/rugpt3large_based_on_gpt2			31.2	22.6	32.8	26.1	43.4
null	gpt2			30.4	21.9	31.6	27.5	40.7
null	distilgpt2			30.2	22.2	27.5	26.8	44.5
null	cerebras/Cerebras-GPT-111M			29.9	20	26.7	26.7	46.3
null	vicgalle/gpt2-alpaca-gpt4			29.8	22.7	31.1	27.3	38
null	bigscience/bloom	GPT

Source: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

PreviousPaLM 2 NextFalcon-40B-Instruct

Last updated 1 year ago