LivingDataLab - Questioning and Answering over Data with LangChain

1 Introduction

In this article we look at how you can split documents, extract the relevant data, take a question, pass them both to a language model, and ask it to answer the question using Langchain.

Recall the overall workflow for retrieval augmented generation (RAG):

After we’ve finished with the storage and ingestion process and have received the relevant context data, we move on to the question and answer phase. To get an answer, we must now pass that information into a language model. The general process goes as follows: we receive the inquiry, search the important documents, feed those parts to the language model along with a system prompt and the human question, and it returns an answer. By default, we just pass each piece into the same context window and language model call. There are a few other approaches we might take, though, and each has advantages and disadvantages.

The majority of the benefits result from the fact that there may occasionally be a large number of documents that you can’t possibly feed into one context window. There are three ways to get over this issue of small context windows: MapReduce, Refine, and MapRerank.

2 Load Libs & Setup

import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

The code below was added to assign the openai LLM version until it is deprecated, currently in Sept 2023. LLM responses can often vary, but the responses may be significantly different when using a different model version.

import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo-0301

from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

print(vectordb._collection.count())

Let’s start by loading the vector database we previously saved from another article. The 209 documents are the same as before, as can be seen. To verify we can check similarity search is functioning properly for this initial query.

3 RetrievalQA chain

Using the GPT 3.5 Chat Open AI Model, we’ll set temperature to 0 and use the chat model. This is excellent for obtaining factual answers because it will have less unpredictability and typically only provide us with the most accurate, trustworthy responses. The retrieval QA chain will then be imported. This is query responding supported by a retrieval step. It can be produced by providing a language model and a vector database as a retriever. When the query is the same as the question we want to ask, we can then call it. We then receive a response when we examine the outcome.

question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)

from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

result = qa_chain({"query": question})

result["result"]

'The major topic for this class is machine learning. Additionally, the class may cover statistics and algebra as refreshers in the discussion sections. Later in the quarter, the discussion sections will also cover extensions for the material taught in the main lectures.'

4 Prompts

Let’s try to gain a better understanding of what’s happening under the hood by revealing a few of the many controls. The prompt we’re employing is the primary factor that is significant in this situation. The question and the documents are passed to a language model using this prompt.

from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

Here, we define a template for a prompt. A placeholder for a context variable follows some instructions on how to use the subsequent context elements. This will serve as both a placeholder for the questions variable and the location for the documents. Now we may build a fresh retrieval QA chain.

We’ll continue to employ the same language model and vector databases as previously, but we will add a few more arguments. We’re going to set this equals to true because we have the return source document. This will make it simple for us to examine the papers we retrieve. The prompt equals to the QA chain prompt that we specified before will then be passed in as well.

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

Let’s try a different query. Is probability covered in class? When we look at the returned result, we can confirm that probability is really considered to be a prerequisite for the course. The teacher entails that you are familiar with fundamental statistics and probability.

question = "Is probability a class topic?"

result = qa_chain({"query": question})

result["result"]

'Yes, probability is assumed to be a prerequisite for this class. The instructor assumes familiarity with basic probability and statistics, and will go over some of the prerequisites in the discussion sections as a refresher course. Thanks for asking!'

result["source_documents"][0]

Document(page_content="of this class will not be very program ming intensive, although we will do some \nprogramming, mostly in either MATLAB or Octa ve. I'll say a bit more about that later.  \nI also assume familiarity with basic proba bility and statistics. So most undergraduate \nstatistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna \nassume all of you know what ra ndom variables are, that all of you know what expectation \nis, what a variance or a random variable is. And in case of some of you, it's been a while \nsince you've seen some of this material. At some of the discussion sections, we'll actually \ngo over some of the prerequisites, sort of as  a refresher course under prerequisite class. \nI'll say a bit more about that later as well.  \nLastly, I also assume familiarity with basi c linear algebra. And again, most undergraduate \nlinear algebra courses are more than enough. So if you've taken courses like Math 51, \n103, Math 113 or CS205 at Stanford, that would be more than enough. Basically, I'm \ngonna assume that all of you know what matrix es and vectors are, that you know how to \nmultiply matrices and vectors and multiply matrix and matrices, that you know what a matrix inverse is. If you know what an eigenvect or of a matrix is, that'd be even better. \nBut if you don't quite know or if you're not qu ite sure, that's fine, too. We'll go over it in \nthe review sections.", metadata={'source': 'docs/MachineLearning-Lecture01.pdf', 'page': 4})

We can look at some of the returned source papers to get a slightly better sense of where it is receiving this information from. You should be able to find the answers to all the questions in one of these sources if you browse through them.

5 RetrievalQA chain types

So far, we have been using the stuff strategy, which is the default method we employ and which essentially involves stuffing all the papers into the final prompt. It simply requires one call to the language model, which is incredibly advantageous. Nevertheless, this does have the drawback that if there are too many documents, they might not all fit inside the context window.

The map-reduce technique is another type of method we might employ to answer questions about documents. In this method, each individual document is delivered to the language model separately at first to obtain a unique response.

qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

result = qa_chain_mr({"query": question})

result["result"]

'There is no clear answer to this question based on the given portion of the document. The document mentions familiarity with basic probability and statistics as a prerequisite for the class, and there is a brief mention of probability in the text, but it is not clear if it is a main topic of the class. The instructor mentions using a probabilistic interpretation to derive a learning algorithm, but does not go into further detail about probability as a topic.'

And then those answers are composed into a final answer with a final call to the language model. This involves many more calls to the language model, but it does have the advantage in that it can operate over arbitrarily many documents. When we run the previous question through this chain, we can see another limitation of this method. Or actually, we can see two. One, it’s a lot slower. Two, the result is actually worse. There is no clear answer on this question based on the given portion of the document.

This may occur because it’s answering based on each document individually. And so, if there is information that’s spread across two documents, it doesn’t have it all in the same context.

Then, with a last call to the language model, those responses are combined into a single response. However, it does offer the advantage of being able to operate over an infinite number of documents while requiring many more calls to the language model. Another drawback of this approach is revealed when we repeat the previous query across this chain. Alternatively, we can see two. One, it moves much more slowly. Second, the outcome is actually worse. Based on the information in the provided section of the paper, there is no definitive response to this query.

This is an excellent chance to use the LangChain platform to better understand what is happening inside these chains. Here, we’ll give an example of this. The MapReduce chain can be restarted when these environment variables have been set. After that, we can go to the user interface to view what is happening internally. We can locate the run we just completed from this point on. The input and output are visible when we click on it. The toddler can then be seen running to get a thorough understanding of what’s going on internally.

If you wish to experiment on the LangChain plus platform:

Go to langchain plus platform and sign up
Create an API key from your account’s settings
Use this API key in the code below
uncomment the code

Use the endpoint below.

#import os
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
#os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
#os.environ["LANGCHAIN_API_KEY"] = "..." # replace dots with your api key

qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

'There is no clear answer to this question based on the given portion of the document. The document mentions familiarity with basic probability and statistics as a prerequisite for the class, and there is a brief mention of probability in the text, but it is not clear if it is a main topic of the class.'

The MapReduce document chain comes first. Actually, there are four calls to the language model in total. We can see that we have the input and the output for each of the documents if we click into one of these calls.

When we look back, we can see that once it has read through each of these documents, it combines them into a final chain called the Stuffed Documents chain and stuffs all of the responses into the last call. When we click on it, we can see that there is a system message with four summaries of the prior documents, a user query, and the solution right there.

By changing the chain type to Refine, we can accomplish a similar task. This kind of chain is novel. It is clear from this that it is calling the Refine Documents chain, which consists of four consecutive calls to an LLM chain. To understand what is happening, let’s look at the initial call in this sequence. The prompt is shown here just before it is submitted to the language model. A system message made consisting of a few bits is shown. This section, the context for which is provided below, is a component of the system message and the prompt template that we previously specified. This paragraph that follows, in its entirety, is from one of the documents that we were able to extract.

Next, we have the user query down here, and finally, we have the resolution right here.

qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

"The main topic of the class is machine learning, but the course assumes that students are familiar with basic probability and statistics, including random variables, expectation, variance, and basic linear algebra. The instructor will provide a refresher course on these topics in some of the discussion sections. Later in the quarter, the discussion sections will also cover extensions for the material taught in the main lectures. Machine learning is a vast field, and there are a few extensions that the instructor wants to teach but didn't have time to cover in the main lectures. The class will not be very programming-intensive, but some programming will be done in MATLAB or Octave."

The following call to the language model can be examined if we turn around at this point. Here, the last instruction we give the language model is a series that combines the prior response with fresh information before requesting an improved response. We can see that the initial user inquiry is here, followed by the same answer as before, and a statement that we have the option to improve the current response, only if necessary, with additional context below.

This is a section of the instructions and the prompt template. The document we retrieved, which is the second one on the list, contains the remaining text. As we can see in the conclusion, the additional instructions, together with the revised original response, improved how the question was answered. We then receive a conclusion in the section below. However, this is only the second final answer; as a result, this loops through all the documents four times before arriving at the ultimate answer. The solution is right here, too.

You’ll see that this yields a superior outcome than the MapReduce chain. This is because using the refined chain actually encourages more information carrying over than using the MapReduce chain, even if doing so requires combining information in a sequential manner.

6 RetrievalQA limitations

There is a lot of room for experimentation here. You can ask follow-up questions, which is one of the fantastic things about chatbots and one of the reasons they’re growing in popularity. For more information on earlier responses, ask. Let’s do that right now. Make a QA chain now. We’ll stick with the default settings. Is probability a subject that is covered in class? Let’s then ask it a related query after that.

QA fails to preserve conversational history.

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

'Yes, probability is a topic that will be assumed to be familiar to students in this class. The instructor assumes that students have a basic understanding of probability and statistics, and will go over some of the prerequisites as a refresher course in the discussion sections.'

question = "why are those prerequesites needed?"
result = qa_chain({"query": question})
result["result"]

'The prerequisites are needed because in this class, the instructor assumes that all students have a basic knowledge of computer science and knowledge of basic computer skills and principles. This includes knowledge of big-O notation and other basic computer science concepts. Without this basic knowledge, it may be difficult for students to understand the material covered in the class.'

Let’s ask why those criteria are necessary as it mentions that probability should be a need. After that, we receive a response. It is anticipated that students entering the course have a foundational understanding of computer science as well as fundamental computing abilities and concepts. That has absolutely nothing to do with the answer to our question regarding probability from before. What is occurring here? The chain we’re utilising essentially has no understanding of state. It doesn’t keep track of the prior queries or responses.

Note, The LLM response varies. Some responses do include a reference to probability which might be gleaned from referenced documents. The point is simply that the model does not have access to past questions or answers, this will be covered in the next article.

7 Acknowledgements

I’d like to express my thanks to the wonderful LangChain: Chat with your data course by DeepLearning.ai and LangChain - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Questioning and Answering over Data with LangChain

Subscribe