Building a Customer Support Question Answering Chatbot

Here, we show how to use website material as additional context to help a chatbot efficiently reply to user queries. The code implementation uses data loaders, and stores the associated embeddings in the Deep Lake dataset, and then retrieves the documents that are most relevant to the user’s query.

Pranath Fernando


August 11, 2023

1 Introduction

Large language models like GPT-4 and ChatGPT have emerged as key advancements in the IT world as we observe faster technological growth. These cutting-edge models exhibit outstanding skill in content creation. They do, however, face some difficulties, including as biases and hallucinations. Despite these drawbacks, LLMs have the power to completely change the way chatbot development is done.

Traditional chatbots, which are mostly intent-based, are made to react to particular user intentions. These intentions include a number of model questions and related answers. A “Restaurant Recommendations” purpose, for instance, would have sample inquiries like “Can you recommend a good Italian restaurant nearby?” or “Where can I get the best sushi in town?” with answers like “You should try the Italian restaurant”La Trattoria” nearby” or “Sushi Palace” is the best sushi restaurant in town.”

When consumers engage with the chatbot, their inquiries are compared to those with the most like intent, producing the corresponding response. However, as LLMs continue to advance, chatbot development is trending towards more advanced and dynamic options that can handle a wider variety of customer enquiries with more accuracy and nuance.

2 Having a Knowledge Base

LLMs can greatly improve chatbot functionality by linking larger intents with Knowledge Base (KB) documents rather than individual questions and replies. This method simplifies intent management and produces more personalised responses to user enquiries.

The maximum prompt size in GPT3 is roughly 4,000 tokens, which is large but insufficient for combining a full knowledge base into a single prompt.

Future LLMs may not have this restriction while still having text creation capabilities. However, for the time being, we must develop a solution around it.

3 Workflow

The goal of this project is to create a chatbot that uses GPT3 to search for answers within documents. The experiment’s workflow is depicted in the diagram below.

To begin, we extract some content from internet publications, divide it into little parts, compute its embeddings, and store it in Deep Lake. Then, using a user inquiry, we retrieve the most relevant chunks from Deep Lake and place them in a prompt, which will be used by the LLM to construct the final answer.

It is vital to highlight that when utilising LLMs, there is always the possibility of generating hallucinations or incorrect information. Although this may not be acceptable for many customer service use cases, the chatbot can nonetheless aid operators in crafting answers that they can double-check before delivering to the user.

n the next steps, we’ll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow:

4 Import Libs & Setup

First, set up the OPENAI_API_KEY and ACTIVELOOP_TOKEN environment variables with your API keys and tokens.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI
from langchain.document_loaders import SeleniumURLLoader
from langchain import PromptTemplate
import os
import openai
import sys
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

These libraries provide OpenAI embeddings, vector storage management, text splitting, and communicating with the OpenAI API. They also allow for the development of a context-aware question-answering system that incorporates retrieval and text generation.

Our chatbot’s database will be made up of articles about technical challenges.

# we'll use information from the following articles
urls = ['',

5 Split the documents into chunks and compute their embeddings

We load the pages from the specified URLs and split them into 1000 parts with no overlap using the CharacterTextSplitter:

# use the selenium scraper to load the documents
loader = SeleniumURLLoader(urls=urls)
docs_not_splitted = loader.load()

# we split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)
Created a chunk of size 1226, which is longer than the specified 1000

The embeddings are then computed using OpenAIEmbeddings and stored in a Deep Lake vector store in the cloud. In an ideal production scenario, we could upload an entire website or course lesson to a Deep Lake dataset, enabling search over thousands or millions of documents. Because we are using a cloud serverless Deep Lake dataset, applications running in multiple locations may easily access the same centralised dataset without the need for a vector store to be deployed on a specific machine.

Now, change the following code to include your Activeloop organisation ID. It’s worth remembering that the org id is your default username.

# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "pranath"
my_activeloop_dataset_name = "langchain_course_customer_support"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
Your Deep Lake dataset has been successfully created!
Dataset(path='hub://pranath/langchain_course_customer_support', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype       shape      dtype  compression
  -------    -------     -------    -------  ------- 
 embedding  embedding  (146, 1536)  float32   None   
    id        text      (146, 1)      str     None   
 metadata     json      (146, 1)      str     None   
   text       text      (146, 1)      str     None   

To retrieve the most similar chunks to a given query, we can use the similarity_search method of the Deep Lake vector store:

# let's see the top relevant documents to a specific query
query = "how to check disk usage in linux?"
docs = db.similarity_search(query)
Home  Tech  How to Check Disk Usage in Linux (4 Methods)

How to Check Disk Usage in Linux (4 Methods)

Beebom Staff

Last Updated: June 19, 2023 5:14 pm

There may be times when you need to download some important files or transfer some photos to your Linux system, but face a problem of insufficient disk space. You head over to your file manager to delete the large files which you no longer require, but you have no clue which of them are occupying most of your disk space. In this article, we will show some easy methods to check disk usage in Linux from both the terminal and the GUI application.

Monitor Disk Usage in Linux (2023)

Table of Contents

Check Disk Space Using the df Command
Display Disk Usage in Human Readable FormatDisplay Disk Occupancy of a Particular Type

Check Disk Usage using the du Command
Display Disk Usage in Human Readable FormatDisplay Disk Usage for a Particular DirectoryCompare Disk Usage of Two Directories

6 Craft a prompt for GPT-3 using the suggested strategies

We’ll develop a prompt template that combines role-prompting, Knowledge Base information, and the user’s question:

# let's write a prompt for a customer support chatbot that
# answer questions using information extracted from our db
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.


Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.

Question: {query}


prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],

The template establishes the chatbot’s persona as an outstanding customer support chatbot. The template accepts two variables as input: chunks_formatted, which contains pre-formatted chunks from articles, and query, which represents the customer’s question. The goal is to construct an accurate answer utilising only the available chunks while avoiding the creation of erroneous or fictional information.

7 Utilize the GPT3 model with a temperature of 0 for text generation

To construct a response, we first obtain the top-k (e.g., top-3) chunks that are most comparable to the user question, format the prompt, and send it to the GPT3 model with a temperature of 0.

# the full pipeline

# user question
query = "How to check disk usage in linux?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# generate answer
llm = OpenAI(model="text-davinci-003", temperature=0)
answer = llm(prompt_formatted)
 You can check disk usage in Linux using the df command or by using a GUI tool such as the GDU Disk Usage Analyzer or the Gnome Disks Tool. The df command is used to check the current disk usage and the available disk space in Linux. The syntax for the df command is: df <options> <file_system>. The options to use with the df command are: a, h, t, and x. To install the GDU Disk Usage Analyzer, use the command: sudo snap install gdu-disk-usage-analyzer. To install the Gnome Disks Tool, use the command: sudo apt-get -y install gnome-disk-utility.

8 Issues with Generating Answers using GPT-3

In the above scenario, the chatbot performs admirably. However, there are some cases where it may fail.

Assume we ask GPT-3, “Is the Linux distribution free?” and provide context in the form of a document regarding kernel features. It may provide a response such as “Yes, the Linux distribution is free to download and use,” even if such information is not contained in the context page. False information is extremely undesirable for customer service chatbots!

When the answer to the user’s question is contained inside the context, GPT-3 is less likely to generate misleading information. We cannot always rely on the semantic search stage to find the proper document because user questions are frequently brief and vague. There is always the possibility of generating erroneous information.

9 An Alternative - SalesCopilot Helping Support Human Customer Services

There may be times for various reasons you choose to use humans in customer services, in these cases you may be able to use something like SalesCopilot as described by this article which looks into how LangChain, Deep Lake, and GPT-4 can be used to develop a sales assistant able to give advice to salesman, taking into considerations internal guidelines.

This article goes into detail about a sales call assistant that connects you to a chatbot that understands the context of your conversation. One of SalesCopilot’s biggest features is its ability to recognise probable customer complaints and provide ideas on how to effectively handle them.

The post illustrates the obstacles encountered and solutions uncovered during the project’s development. You’ll discover the two unique text-splitting approaches that failed and how these failures paved the path for an effective solution.

Initially, the authors attempted to rely entirely on the LLM, but they experienced challenges with GPT-4 such as response inconsistency and sluggish response times. Second, they erroneously divided the custom knowledge base into chunks, which resulted in context misalignment and inefficient results.

Following these failed attempts, a more intelligent method of partitioning the knowledge base based on its structure was developed. This improvement significantly enhanced response quality and ensured stronger context grounding for LLM responses. This process is detailed, allowing you to understand how to overcome comparable issues in your own AI projects.

The paper then delves into how SalesCopilot was linked with Deep Lake. This integration expanded SalesCopilot’s capabilities by collecting the most appropriate responses from a proprietary knowledge base, resulting in a dependable, efficient, and highly adjustable solution for dealing with client concerns.

10 Conclusion

GPT-3 is quite good at generating conversational chatbots that can answer particular queries based on the context provided in the prompt. However, because the model has a tendency to hallucinate (i.e., invent new, potentially erroneous information), it might be difficult to ensure that it generates answers simply based on the context. The intensity of erroneous information generation varies based on the use case.

Finally, we used LangChain to build a context-aware question-answering system, following the code and ideas supplied. Splitting documents into chunks, computing their embeddings, constructing a retriever to discover related chunks, creating a prompt for GPT-3, and using the GPT3 model for text production were all part of the procedure. This method highlights the power of exploiting GPT-3 to build powerful and contextually correct chatbots while also emphasising the need to be cautious about the possibilities of providing fake information.

Further Reading:

11 Acknowledgements

I’d like to express my thanks to the wonderful LangChain & Vector Databases in Production Course by Activeloop - which i completed, and acknowledge the use of some images and other materials from the course in this article.