๐งฑOur platform
To make large models accessible to all organizations, weโve built SIA Inference. SIA Inference offers two service tiers: Enterprise and Starter.
To learn more about SIA Inference, see our product page.
Enterprise Tier
With the Enterprise Tier, you can turn any saved model checkpoint into a secure, inexpensive API within a SIA managed cluster, or within your own virtual private cloud (VPC), in under a minute.
To learn more about deploying your own model in your own secure environment, read our blog post and check out our documentation.
Starter Tier
For less demanding applications, the Starter Tier features a suite of open source models with commercial licensing terms. These models are hosted by SIA and available through an API, offering text embeddings and text generation use cases.
Text Embedding Models
Embedding models are used to obtain a vector representation of an input string. Embedding vectors can be used to compute the similarity of two input strings, retrieve documents relevant to a specific query, and more.
Model | Description | Endpoint |
---|---|---|
It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. |
| |
A 335M parameters, instruction finetuned model capable of generating embeddings for various tasks. The |
| |
A 1.2B parameters, instruction finetuned model capable of generating text embeddings for various tasks. The |
|
Text Generation Models
Generate text based on a provided input prompt string. They can be used for generic text completion, question answering, information extraction, and much more.
Model | Description | Instruction finetuned | Endpoint |
---|---|---|---|
A 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the TII Falcon LLM License. | Yes |
| |
A 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the TII Falcon LLM License. | No |
| |
Yes |
| ||
A 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. | No |
| |
A 20B parameter language model capable of generating free-form text completions, trained and released by EleutherAI. See the paper for more information. | No |
| |
A 12B parameter instruction finetuned language model released by Databricks. The model is based on the Pythia-12B model trained by EleutherAI, and is further instruction finetuned on a dataset created by Databricks. See the Databricks code for an example of how to best format your prompt for this model. | Yes |
| |
The 1.5B parameter version of GPT-2, an open-source language model capable of generating free-form text completions, trained and released by OpenAI. | No |
| |
A state-of-the-art 6.7B parameter instruction finetuned language model trained by MosaicML. The model is pretrained for 1T tokens on a mixture of datasets, and then further instruction finetuned on a dataset derived from the Databricks Dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets. | Yes |
|
API Reference
Users can interact with SIA's hosted models through HTTP requests to our REST API, enabling robust and extensible support for any programming language.
Authentication
Accessing the SIA REST API requires a SIA platform API key for authentication. Please see our Quick Start for instructions on how to set up SIA platform access.
Embedding Requests
To calculate embeddings for a string, send your text string to the hosted endpoint of the embedding model you wish to query.
POST https://models.hosted-on.sia.hosting/<endpoint>
Request example:
PYTHON
BASH
Response example:
Request body
Parameters | Type | Required | Default | Description |
---|---|---|---|---|
input_strings | List[[str, str]] | yes | N/A | List of pairs of strings in format [[โInstructionโ, โSentenceโ]] |
Text Generation Requests
To generate text, send an input string to the hosted endpoint of the text generation model you wish to query.
POST https://models.hosted-on.sia.hosting/<endpoint>
Request example:
PYTHON
BASH
Response example:
Request body
Parameters | Type | Required | Default | Description |
---|---|---|---|---|
input_string | List[str] | yes | N/A | The prompt to generate a completion for. |
top_p | float | no | 0.95 | Defines the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p. |
temperature | float | no | 0.8 | The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability. |
max_length | int | no | 256 | Defines the maximum length in tokens of the output summary. |
use_cache | bool | no | TRUE | Whether to use KV cacheing during autoregressive decoding. This will use more memory but improve speed. |
do_sample | bool | no | TRUE | Whether or not to use sampling, use greedy decoding otherwise. |
Last updated