🧱Our platform

To make large models accessible to all organizations, we’ve built SIA Inference. SIA Inference offers two service tiers: Enterprise and Starter.

To learn more about SIA Inference, see our product page.

Enterprise Tier

With the Enterprise Tier, you can turn any saved model checkpoint into a secure, inexpensive API within a SIA managed cluster, or within your own virtual private cloud (VPC), in under a minute.

To learn more about deploying your own model in your own secure environment, read our blog post and check out our documentation.

Starter Tier

For less demanding applications, the Starter Tier features a suite of open source models with commercial licensing terms. These models are hosted by SIA and available through an API, offering text embeddings and text generation use cases.

Text Embedding Models

Embedding models are used to obtain a vector representation of an input string. Embedding vectors can be used to compute the similarity of two input strings, retrieve documents relevant to a specific query, and more.

Model Description Endpoint

Model	Description	Endpoint
all-roberta-large-v1	It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.	`/roberta-large/v1`
instructor-large	A 335M parameters, instruction finetuned model capable of generating embeddings for various tasks. The `instructor` model series is state-of-the-art on the Massive Text Embedding Benchmark (MTEB). See the model card for guidelines on how to best prompt the model.	`/instructor-large/v1`
instructor-xl	A 1.2B parameters, instruction finetuned model capable of generating text embeddings for various tasks. The `instructor` model series is state-of-the-art on the Massive Text Embedding Benchmark (MTEB). See the model card for guidelines on how to best prompt the model.	`/instructor-xl/v1`

all-roberta-large-v1

It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search.

/roberta-large/v1

instructor-large

A 335M parameters, instruction finetuned model capable of generating embeddings for various tasks. The instructor model series is state-of-the-art on the Massive Text Embedding Benchmark (MTEB). See the model card for guidelines on how to best prompt the model.

/instructor-large/v1

instructor-xl

A 1.2B parameters, instruction finetuned model capable of generating text embeddings for various tasks. The instructor model series is state-of-the-art on the Massive Text Embedding Benchmark (MTEB). See the model card for guidelines on how to best prompt the model.

/instructor-xl/v1

Text Generation Models

Generate text based on a provided input prompt string. They can be used for generic text completion, question answering, information extraction, and much more.

Model Description Instruction finetuned Endpoint

Model	Description	Instruction finetuned	Endpoint
tiiuae/falcon-40b-instruct	A 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the TII Falcon LLM License.	Yes	`/falcon-40b-instruct/v1`
tiiuae/falcon-40b	A 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the TII Falcon LLM License.	No	`/falcon-40b/v1`
tiiuae/falcon-7b-instruct	A 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.	Yes	`/falcon-7b-intruct/v1`
tiiuae/falcon-7b	A 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.	No	`/falcon-7b/v1`
EleutherAI/gpt-neox-20b	A 20B parameter language model capable of generating free-form text completions, trained and released by EleutherAI. See the paper for more information.	No	`/gpt-neox-20b/v1`
databricks/dolly-v2-12b	A 12B parameter instruction finetuned language model released by Databricks. The model is based on the Pythia-12B model trained by EleutherAI, and is further instruction finetuned on a dataset created by Databricks. See the Databricks code for an example of how to best format your prompt for this model.	Yes	`/dolly-12b/v1`
gpt2-xl	The 1.5B parameter version of GPT-2, an open-source language model capable of generating free-form text completions, trained and released by OpenAI.	No	`/gpt2-xl/v1`
mosaicml/mpt-7b-instruct	A state-of-the-art 6.7B parameter instruction finetuned language model trained by MosaicML. The model is pretrained for 1T tokens on a mixture of datasets, and then further instruction finetuned on a dataset derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets.	Yes	`/mpt-7b-instruct/v1`

tiiuae/falcon-40b-instruct

A 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the TII Falcon LLM License.

Yes

/falcon-40b-instruct/v1

tiiuae/falcon-40b

A 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the TII Falcon LLM License.

/falcon-40b/v1

tiiuae/falcon-7b-instruct

A 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.

Yes

/falcon-7b-intruct/v1

tiiuae/falcon-7b

A 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license.

/falcon-7b/v1

EleutherAI/gpt-neox-20b

A 20B parameter language model capable of generating free-form text completions, trained and released by EleutherAI. See the paper for more information.

/gpt-neox-20b/v1

databricks/dolly-v2-12b

A 12B parameter instruction finetuned language model released by Databricks. The model is based on the Pythia-12B model trained by EleutherAI, and is further instruction finetuned on a dataset created by Databricks. See the Databricks code for an example of how to best format your prompt for this model.

Yes

/dolly-12b/v1

gpt2-xl

The 1.5B parameter version of GPT-2, an open-source language model capable of generating free-form text completions, trained and released by OpenAI.

/gpt2-xl/v1

mosaicml/mpt-7b-instruct

A state-of-the-art 6.7B parameter instruction finetuned language model trained by MosaicML. The model is pretrained for 1T tokens on a mixture of datasets, and then further instruction finetuned on a dataset derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets.

Yes

/mpt-7b-instruct/v1

API Reference

Users can interact with SIA's hosted models through HTTP requests to our REST API, enabling robust and extensible support for any programming language.

Authentication

Accessing the SIA REST API requires a SIA platform API key for authentication. Please see our Quick Start for instructions on how to set up SIA platform access.

Embedding Requests

To calculate embeddings for a string, send your text string to the hosted endpoint of the embedding model you wish to query.

POST https://models.hosted-on.sia.hosting/<endpoint>

Request example:

PYTHON

from siacli.sdk import predict

inputs = {
    "input_strings": [
        [
            "Represent the Science title:",
            "3D ActionSLAM: wearable person tracking in multi-floor environments"
        ]
    ]
}
predict('https://models.hosted-on.sia.hosting/instructor-large/v1', inputs)

BASH

Response example:

{
    "data":[
        [
            -0.06155527010560036,0.010419987142086029,0.005884397309273481...-0.03766140714287758,0.010227023623883724,0.04394740238785744
        ]
    ]
}

Request body

Parameters	Type	Required	Default	Description
input_strings	List[[str, str]]	yes	N/A	List of pairs of strings in format [[“Instruction”, “Sentence”]]

Parameters

Type

Required

Default

Description

input_strings

List[[str, str]]

yes

N/A

List of pairs of strings in format [[“Instruction”, “Sentence”]]

Text Generation Requests

To generate text, send an input string to the hosted endpoint of the text generation model you wish to query.

POST https://models.hosted-on.sia.hosting/<endpoint>

Request example:

PYTHON

from siacli.sdk import predict

prompt = "Write 3 reasons why you should train an AI model on domain specific data set."

predict('https://models.hosted-on.sia.hosting/mpt-7b-instruct/v1', {'input_strings': [prompt], 'temperature': 0.01})

BASH

Response example:

{
    'data': [
        '1. The model will be more accurate.\n2. The model will be more efficient.\n3. The model will be more interpretable.'
    ]
}

Request body

Parameters	Type	Required	Default	Description
input_string	List[str]	yes	N/A	The prompt to generate a completion for.
top_p	float	no	0.95	Defines the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p.
temperature	float	no	0.8	The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability.
max_length	int	no	256	Defines the maximum length in tokens of the output summary.
use_cache	bool	no	TRUE	Whether to use KV cacheing during autoregressive decoding. This will use more memory but improve speed.
do_sample	bool	no	TRUE	Whether or not to use sampling, use greedy decoding otherwise.

Parameters

Type

Required

Default

Description

input_string

List[str]

yes

N/A

The prompt to generate a completion for.

top_p

float

0.95

Defines the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p.

temperature

float

0.8

The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability.

max_length

int

256

Defines the maximum length in tokens of the output summary.

use_cache

bool

TRUE

Whether to use KV cacheing during autoregressive decoding. This will use more memory but improve speed.

do_sample

bool

TRUE

Whether or not to use sampling, use greedy decoding otherwise.

PreviousFuture of LLMOps: advancements and challenges NextOur end-user apps

Last updated 11 months ago