How to build a custom ChatGPT assistant for your documentation

How to build a custom ChatGPT assistant for your documentation

Using vector embeddings for context

Featured on Hashnode

ChatGPT can be a helpful tool for development and learning. But only if it knows about the technology you're using.

I wanted ChatGPT to help me develop Hilla applications, but since it was released after 2021, instead of giving helpful answers, ChatGPT made up answers that had nothing to do with reality.

I also had the challenge that Hilla supports both React and Lit on the front end, and I want to ensure that the answers use the correct framework as context.

Here's how I built an assistant using the most up-to-date documentation as context for ChatGPT to give me relevant answers.

Key concept: embeddings

ChatGPT, like other LLMs, has a limited context size that needs to fit your question, relevant background information, and the answer. For example, gpt-3.5-turbo has a limit of 4096 tokens or roughly 3000 words. In order to get relevant answers to your questions, you need to include the most valuable parts of documentation in the prompt.

An efficient way of finding the most relevant parts of the documentation to include is embeddings. You can think of embeddings as a way of encoding the meaning of a piece of text into a vector that describes a point in a multi-dimensional space. Texts with similar meanings are close to each other, while texts that differ in meaning are farther apart.

A document on the left gets mapped through a magic black box that outputs a vector representing the text's meaning.

The concept is very similar to a color picker. Each color can be represented by a 3-element vector of red, green, and blue values. Similar colors have similar values, whereas different colors have different values.

A color picker representing a color as an RGB value

For this article, it's enough to know that there's an OpenAI API that turns texts into embeddings. This article on embeddings is a good resource if you want to learn more about how embeddings work under the hood.

Once you have created embeddings for your documentation, you can quickly find the most relevant parts to include in the prompt by finding the parts closest in meaning to the question.

The big picture: providing documentation as context to ChatGPT

Here are the high-level steps needed to make ChatGPT use your documentation as context for answering questions:

Generating embeddings for your documentation

  1. Split your documentation into smaller parts, for instance, by heading, and create an embedding (vector) for each chunk.

  2. Save the embedding, source text, and other metadata in a vector database.

A 2D visualization showing different topics on a plane.

Answering questions with documentation as context

  1. Create an embedding for the user question.

  2. Search the vector database for the N parts of the documentation most related to the question using the embedding.

  3. Construct a prompt instructing ChatGPT to answer the given question only using the provided documentation.

  4. Use the OpenAI API to generate a completion for the prompt.

A visualization of embeddings on a 2D plane. A cluster of related themes are highlighted as similar to the given question.

In the following sections, I will go into more detail about how I implemented these steps.

Tools used

Source code

I will only highlight the most relevant sections of the code below. You can find the complete source code on GitHub.

Processing documentation

The Hilla documentation is in Asciidoc format. The steps needed to process them into embeddings are as follows:

  • Process the Asciidoc files with Asciidoctor to include code snippets and other includes

  • Split the resulting document into sections based on the HTML document structure

  • Convert the content to plain text to save on tokens

  • If needed, split sections into smaller pieces

  • Create embedding vectors for each text section

  • Save the embedding vectors along with the source text into Pinecone

Asciidoc processing

async function processAdoc(file, path) {
  console.log(`Processing ${path}`);

  const frontMatterRegex = /^---[\s\S]+?---\n*/;

  const namespace = path.includes('articles/react') ? 'react' : path.includes('articles/lit') ? 'lit' : '';
  if (!namespace) return;

  // Remove front matter. The JS version of asciidoctor doesn't support removing it.
  const noFrontMatter = file.replace(frontMatterRegex, '');

  // Run through asciidoctor to get includes
  const html = asciidoctor.convert(noFrontMatter, {
    attributes: {
      root: process.env.DOCS_ROOT,
      articles: process.env.DOCS_ARTICLES,
      react: namespace === 'react',
      lit: namespace === 'lit'
    safe: 'unsafe',
    base_dir: process.env.DOCS_ARTICLES

  // Extract sections
  const dom = new JSDOM(html);
  const sections = dom.window.document.querySelectorAll('.sect1');

  // Convert section html to plain text to save on tokens
  const plainText = Array.from(sections).map(section => convert(section.innerHTML));

  // Split section content further if needed, filter out short blocks
  const docs = await splitter.createDocuments(plainText);
  const blocks = => doc.pageContent)
    .filter(block => block.length > 200);

  await createAndSaveEmbeddings(blocks, path, namespace);

Create embeddings and save them

async function createAndSaveEmbeddings(blocks, path, namespace) {

  // OpenAI suggests removing newlines for better performance when creating embeddings. 
  // Don't remove them from the source.
  const withoutNewlines = => block.replace(/\n/g, ' '));
  const embeddings = await getEmbeddings(withoutNewlines);
  const vectors =, i) => ({
    id: nanoid(),
    values: embedding,
    metadata: {
      path: path,
      text: blocks[i]

  await pinecone.upsert({
    upsertRequest: {

Get embeddings from OpenAI

export async function getEmbeddings(texts) {
  const response = await openai.createEmbedding({
    model: 'text-embedding-ada-002',
    input: texts
  return => item.embedding);

Searching with context

So far, we have split the documentation into small sections and saved them in a vector database. When a user asks a question, we need to:

  • Create an embedding based on the asked question

  • Search the vector database for the 10 parts of the documentation that are most relevant to the question

  • Create a prompt that includes as many documentation sections as possible into 1536 tokens, leaving 2560 tokens for the answer.

async function getMessagesWithContext(messages: ChatCompletionRequestMessage[], frontend: string) {

  // Ensure that there are only messages from the user and assistant, trim input
  const historyMessages = sanitizeMessages(messages);

  // Send all messages to OpenAI for moderation.
  // Throws exception if flagged -> should be handled properly in a real app.
  await moderate(historyMessages);

  // Extract the last user message to get the question
  const [userMessage] = historyMessages.filter(({role}) => role === ChatCompletionRequestMessageRoleEnum.User).slice(-1)

  // Create an embedding for the user's question
  const embedding = await createEmbedding(userMessage.content);

  // Find the most similar documents to the user's question
  const docSections = await findSimilarDocuments(embedding, 10, frontend);

  // Get at most 1536 tokens of documentation as context
  const contextString = await getContextString(docSections, 1536);

  // The messages that set up the context for the question
  const initMessages: ChatCompletionRequestMessage[] = [
      role: ChatCompletionRequestMessageRoleEnum.System,
      content: codeBlock`
            You are Hilla AI. You love to help developers! 
            Answer the user's question given the following
            information from the Hilla documentation.
      role: ChatCompletionRequestMessageRoleEnum.User,
      content: codeBlock`
          Here is the Hilla documentation:
      role: ChatCompletionRequestMessageRoleEnum.User,
      content: codeBlock`
            Answer all future questions using only the above        
            documentation and your knowledge of the 
            ${frontend === 'react' ? 'React' : 'Lit'} library
            You must also follow the below rules when answering:
            - Do not make up answers that are not provided 
              in the documentation 
            - If you are unsure and the answer is not explicitly 
              written in the documentation context, say 
              "Sorry, I don't know how to help with that"
            - Prefer splitting your response into 
              multiple paragraphs
            - Output as markdown
            - Always include code snippets if available

  // Cap the messages to fit the max token count, removing earlier messages if necessary
  return capMessages(

When a user asks a question, we call getMessagesWithContext() to get the messages we need to send to ChatGPT. We then call the OpenAI API to get the completion and stream the response to the client.

export default async function handler(req: NextRequest) {
  // All the non-system messages up until now along with 
  // the framework we should use for the context.
  const {messages, frontend} = (await req.json()) as {
    messages: ChatCompletionRequestMessage[],
    frontend: string
  const completionMessages = await getMessagesWithContext(messages, frontend);
  const stream = await streamChatCompletion(completionMessages, MAX_RESPONSE_TOKENS);
  return new Response(stream);

Source code


This post and demo app wouldn't have been possible without the excellent insights and resources below: