LivingDataLab - Using LangChain Memory for LLM Applications

1 Introduction

Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using LLMs in isolation is often not enough in practice to create a truly powerful or useful business application - the real power comes when you are able to combine them with other sources of computation, services or knowledge. LangChain is an intuitive open-source python framework created to simplify the development of useful applications using large language models (LLMs), such as OpenAI or Hugging Face.

In other articles we introduced the LangChain library and key components.

In this article, we will look at how LangChain can give LLM’s context and memory which can be useful for applications such as Chatbots where remembering previous parts of a conversation can be very helpful.

2 Setup

We will use OpenAI’s ChatGPT LLM for our examples, so lets load in the required libraries.

import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

3 Use Case - Managing a ChatBot Conversation

Lets imagine we have an application where we want to use a ChatBot, but we want to remember the history of everything said in the conversation to make it more useful.

So lets define a LangChain object for a ‘conversation chain’ that will use an LLM and a buffer memory object.

llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

So lets see how this works in practice. Let’s say we mention a name in an earlier conversation, will it remember the name later?

With the verbose setting as true, we can see what prompts are being automatically generated by LangChain to help with this use case seen below in green, and the conversation history saved.

conversation.predict(input="Hi, my name is Andrew")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Andrew
AI:

> Finished chain.

"Hello Andrew, it's nice to meet you. My name is AI. How can I assist you today?"

conversation.predict(input="What is 1+1?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Andrew
AI: Hello Andrew, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI:

> Finished chain.

'The answer to 1+1 is 2.'

conversation.predict(input="What is my name?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Andrew
AI: Hello Andrew, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI: The answer to 1+1 is 2.
Human: What is my name?
AI:

> Finished chain.

'Your name is Andrew, as you mentioned earlier.'

In fact we can print the memory of the conversation separately like this:

print(memory.buffer)

Human: Hi, my name is Andrew
AI: Hello Andrew, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI: The answer to 1+1 is 2.
Human: What is my name?
AI: Your name is Andrew, as you mentioned earlier.

We can also print stored variables.

memory.load_memory_variables({})

{'history': "Human: Hi, my name is Andrew\nAI: Hello Andrew, it's nice to meet you. My name is AI. How can I assist you today?\nHuman: What is 1+1?\nAI: The answer to 1+1 is 2.\nHuman: What is my name?\nAI: Your name is Andrew, as you mentioned earlier."}

So LangChain is saving the conversation with the ConversationBufferMemory() object. So you can manually add this to this object like this:

memory = ConversationBufferMemory()

memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

print(memory.buffer)

Human: Hi
AI: What's up

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

4 Memory and LLM’s

So LLM’s are ‘stateless’ by default - meaning each transaction with them is normally independant of all other transactions i.e. it does’nt remember anything by default. When Chatbots appear to have ‘memory’ this is due to the whole previous conversation being explicitly provided as context for each transaction.

This also means as a conversation becomes longer, the memory requirements for storing this conversation history become greater and greater for example when we use ConversationBufferMemory(), this also potentially impacts costs as paid for services such as ChatGPT charge in relation to the number of tokens submitted to the model. LangChain provides various kinds of memory to make it easier and more convenient to store conversation history.

5 Limiting Memory by Previous Conversations

ConversationBufferWindowMemory differs from ConversationBufferMemory in that it only keeps a limited ‘window’ of the history of the past conversation, rather than keeping the entire conversation. This can of course be helpful for limiting costs for paid for LLM services such as ChatGPT. Setting the value of ‘k’ controls how many inputs and responses back we want to store.

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)

memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})

memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

This will mean of course it won’t remember everything - for example in the previous example it won’t remember the name.

llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

conversation.predict(input="Hi, my name is Andrew")

"Hello Andrew, it's nice to meet you. My name is AI. How can I assist you today?"

conversation.predict(input="What is 1+1?")

'The answer to 1+1 is 2.'

conversation.predict(input="What is my name?")

"I'm sorry, I don't have access to that information. Could you please tell me your name?"

So in practice you’d probably want to set k to a value bigger than one, but this still allows you to control the size of memory used and therefore control the cost of paid LLM’s.

6 Limiting Memory by Previous Tokens

In contrast to ConversationBufferWindowMemory, ConversationTokenBufferMemory limits memory by number of tokens rather than conversations. This can be especially useful givem LLM’s often limit or pay for transaction in terms of number of tokens rather than number of previous conversations (which will have variable numbers of tokens).

from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0)

So lets create one of these and set a token limit of 30, and manually create a conversation history - let’s see what it saves.

Note we include the llm as a parameter for the ConversationTokenBufferMemory() as different models tokenise text in different ways.

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=30)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

memory.load_memory_variables({})

{'history': 'AI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

7 Limiting Memory by Summary

So rather than limiting memory by number of conversations or tokens ConversationSummaryMemory limits memory by using an LLM to write a summary of the conversation so far, and let that be the limited memory used.

from langchain.memory import ConversationSummaryBufferMemory

# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

memory.load_memory_variables({})

{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments."}

conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

conversation.predict(input="What would be a good demo to show?")



> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments.
Human: What would be a good demo to show?
AI:

> Finished chain.

"Based on the customer's interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can accurately understand and respond to complex language queries, and even provide personalized recommendations based on the user's preferences. Additionally, we could highlight our AI's ability to learn and adapt over time, making it a valuable tool for businesses looking to improve their customer experience."

memory.load_memory_variables({})

{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments. The human asks what would be a good demo to show.\nAI: Based on the customer's interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can accurately understand and respond to complex language queries, and even provide personalized recommendations based on the user's preferences. Additionally, we could highlight our AI's ability to learn and adapt over time, making it a valuable tool for businesses looking to improve their customer experience."}

This could be a really interesting way of controlling the memory use while trying to maximise the value of memory used - by using text summarisation.

8 Other types of LLM Applications and Memory

While we can see the benefits of using various types of limited memory for a ChatBot application, more efficient memory could be useful for many other LLM applications such as gathering a developing store of news facts that uses limited memory and maximises the value of that limited memory. LangChain also supports other types of memory such as:

Vector data memory: Stores text as embeddings (from chats or elsewhere) in a vector database and retreives the most relevant blocks of text using the prompt query and blocks of text using vectorised text e.g. using bag of words and similarity measures such as cosine similarity
Entity memories: Using an LLM, remembering details about specific entities e.g. specific people, organisations etc

You can also use multiple memory types together e.g. a conversation memory and entity memory to remember individuals or organisations etc. You could also store the conversation in a normal database such as a key-value store or relational db/SQL.

9 Acknowledgements

I’d like to express my thanks to the wonderful LangChain for LLM Application Development Course by DeepLearning.ai - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Using LangChain Memory for LLM Applications

Subscribe