πŸ—οΈAdaptation to downstream tasks

Once you've selected your foundation model, you can interact with the LLM through its API. Initially, this interaction may seem unusual if you're familiar with other APIs, as the relationship between input and output isn't always evident. Given a text prompt, the API will generate a text completion, striving to follow the pattern you provided.

Consider this example of using the OpenAI API: you input a prompt like "Correct this to standard English:\n\nShe no went to the market.".

import openai
openai.api_key = ...
response = openai.Completion.create(
	engine = "text-davinci-003",
	prompt = "Correct this to standard English:\n\nShe no went to the market.",
	# ...
)

The API will then produce a response that includes the completion: response['choices'][0]['text'] = "She did not go to the market."

The primary challenge lies in directing the LLM to produce the desired output. Despite their power, LLMs have limitations. As noted in the LLM in production survey, model accuracy and hallucinations were notable concerns. This means obtaining the desired format from the LLM API might require several iterations, and LLMs can produce unexpected results if they lack specific knowledge. To address these issues, there are several ways to adapt foundation models to downstream tasks:

Prompt Engineering is a method to adjust the input so the output aligns with your expectations. Various techniques can enhance your prompt (refer to the OpenAI Cookbook for ideas). Providing examples of the expected output format is one such method, aligning with zero-shot or few-shot learning scenarios. Tools like LangChain or HoneyHive are emerging to assist with managing and versioning your prompt templates.

Fine-tuning refers to the process of taking a pre-trained language model and retraining it for a different but related task using specific data. This approach is also known as transfer learning, which involves transferring the knowledge learned from one task to another. Large language models (LLMs) like GPT-J 6B are trained on massive amounts of unlabeled data and can be fine-tuned on domain domain datasets, making the model perform better on that specific domain.

Although it necessitates increased training efforts, it can reduce inference costs, as the cost of LLM APIs depends on input and output sequence length. Thus, reducing the number of input tokens lowers API costs because you no longer need to provide examples in the prompt.

External data: Foundation models often lack contextual information (such as specific documents or emails) and can become outdated quickly. To prevent LLMs from hallucinating due to insufficient information, they need access to relevant external data. Tools like LlamaIndex (GPT Index), LangChain, or DUST are available to act as central interfaces to connect LLMs to other agents and external data.

Embeddings: Another method involves extracting information in the form of embeddings from LLM APIs (e.g., movie summaries or product descriptions) and building applications atop them (e.g., search, comparison, or recommendations). If np.array doesn't suffice for storing your embeddings for long-term memory, vector databases such as Pinecone, Weaviate, or Milvus can be used.

Alternatives: The field is rapidly evolving, yielding new ways to leverage LLMs in AI products. Examples include instruction tuning/prompt tuning and model distillation.

Last updated