Langchain and OpenAI Functions - Conversational agents

In this post we will explore the advanced concept of conversational agents. Building upon our previous articles, we delve into creating a conversational future using Large Language Models (LLMs), blending tool usage with chat memory, akin to the workings of OpenAI’s ChatGPT. This article serves as a comprehensive guide to understanding and implementing these agents.
natural-language-processing
langchain
openai
agents
Author

Pranath Fernando

Published

November 8, 2023

1 Introduction

In this post we aim to explore the advanced concept of conversational agents. Building upon our previous articles, we delve into creating a conversational future using Large Language Models (LLMs), blending tool usage with chat memory, akin to the workings of OpenAI’s ChatGPT. This article serves as a comprehensive guide to understanding and implementing these agents.

1.1 Understanding the Basics of Conversational Agents

First, let’s go over the fundamentals of agents, which are a combination of language models and code, with the language model dictating what actions to do or what the inputs to those actions should be. Then there’s an agent loop, which simply utilises that agent to select the tool, then calls that tool, examines the result, and repeats this till it reaches criteria. Why this process?

They can be a variety of reasons. The first and most common is to use a language model to identify where to stop. So, if you recall from the previous post, we discussed the concept of an agent’s finish. We’ve modelled response at this point. This is one such talking point. Other options include hard coding rules such as the maximum number of iterations or anything similar.

We’ll be using the tools we created in the last article. We’ll take that concept of routing and selecting and then calling the tool and include it into our own agent loop using the language. We’ll utilise the agent-executive class in language to demonstrate how it corresponds to the agent loop. We’ll also include items like handling errors and early stopping.

The first thing we’ll do is set up the environment, followed by the identical tools we used in the prior article.

import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']
from langchain.tools import tool
import requests
from pydantic import BaseModel, Field
import datetime

# Define the input schema
class OpenMeteoInput(BaseModel):
    latitude: float = Field(..., description="Latitude of the location to fetch weather data for")
    longitude: float = Field(..., description="Longitude of the location to fetch weather data for")

@tool(args_schema=OpenMeteoInput)
def get_current_temperature(latitude: float, longitude: float) -> dict:
    """Fetch current temperature for given coordinates."""
    
    BASE_URL = "https://api.open-meteo.com/v1/forecast"
    
    # Parameters for the request
    params = {
        'latitude': latitude,
        'longitude': longitude,
        'hourly': 'temperature_2m',
        'forecast_days': 1,
    }

    # Make the request
    response = requests.get(BASE_URL, params=params)
    
    if response.status_code == 200:
        results = response.json()
    else:
        raise Exception(f"API Request failed with status code: {response.status_code}")

    current_utc_time = datetime.datetime.utcnow()
    time_list = [datetime.datetime.fromisoformat(time_str.replace('Z', '+00:00')) for time_str in results['hourly']['time']]
    temperature_list = results['hourly']['temperature_2m']
    
    closest_time_index = min(range(len(time_list)), key=lambda i: abs(time_list[i] - current_utc_time))
    current_temperature = temperature_list[closest_time_index]
    
    return f'The current temperature is {current_temperature}°C'
import wikipedia

@tool
def search_wikipedia(query: str) -> str:
    """Run Wikipedia search and get page summaries."""
    page_titles = wikipedia.search(query)
    summaries = []
    for page_title in page_titles[: 3]:
        try:
            wiki_page =  wikipedia.page(title=page_title, auto_suggest=False)
            summaries.append(f"Page: {page_title}\nSummary: {wiki_page.summary}")
        except (
            self.wiki_client.exceptions.PageError,
            self.wiki_client.exceptions.DisambiguationError,
        ):
            pass
    if not summaries:
        return "No good Wikipedia Search Result was found"
    return "\n\n".join(summaries)
tools = [get_current_temperature, search_wikipedia]
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.tools.render import format_tool_to_openai_function
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser

1.2 Implementing Conversational Agents in Practice

1.2.1 Setting Up the Environment

On our list of tools, we’ll generate functions by calling format tool to open a function. We’ll build our model with temperature equal to zero and bind it to functions. We previously linked everything together in a chain, adding to the end.

functions = [format_tool_to_openai_function(f) for f in tools]
model = ChatOpenAI(temperature=0).bind(functions=functions)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful but sassy assistant"),
    ("user", "{input}"),
])
chain = prompt | model | OpenAIFunctionsAgentOutputParser()
result = chain.invoke({"input": "what is the weather is sf?"})
result.tool
'get_current_temperature'
result.tool_input
{'latitude': 37.7749, 'longitude': -122.4194}

Again, that’s because the input here is specifically aimed at calling one of the tools. What we want to do is create this loop that determines what tools do is then calls that tool and passes it back in and looks at it and tells them something. If you look at the prompt, what that means is we need a place in the prompt that’s back in this history of tools that are called and the corresponding output. We don’t have to change the prompt a little bit to add that in. Specifically, we want to add in a place where we can pass in a list of messages.

from langchain.prompts import MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful but sassy assistant"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

That’s because we’ll convert this tool selection to a tool observation and a collection of messages before passing it back in. To serve as a placeholder for this series of messages, we’ll utilise a message in placeholder. Then, on our prompt template, we’ll have the same initial two components, but we’ll put this even script add message in placeholder. Essentially, what is the same is that we’ll have a system message, which is this, again, brief system message, where they’ll have the user input. Following that, we’ll pass in the action and observation pairings. That prompt is created by us. With this prompt, we’re going to start a chain.

chain = prompt | model | OpenAIFunctionsAgentOutputParser()

Let’s try it again with the same input. We frequently need to pass an agent scratchpad at this point because this is the variable that we’re going to feed in. We’ll pass through this beam and move up the list because we haven’t performed any actions yet. Then, as we’ve been doing, we can see which function result one wants us to call, and then we can call that function and return to observation. Looking at this observation, we can observe that it is made of string. We also have result one, which is this agent action message log.

result1 = chain.invoke({
    "input": "what is the weather is sf?",
    "agent_scratchpad": []
})
result1.tool
'get_current_temperature'
observation = get_current_temperature(result1.tool_input)
observation
'The current temperature is 11.0°C'
type(result1)
langchain.schema.agent.AgentActionMessageLog

So, how do we convert these two things back into a list of messages that we didn’t pass into the agent scratchpad? To accomplish this, we’ll utilise this function format to open a function from the langchain agents format_scratchpad. What happens behind the scenes in format to open a function? What will happen is that we will take the first result. We’ll take the message log from result one in particular. So the message log is a collection of messages that contribute to the current agent action.

from langchain.agents.format_scratchpad import format_to_openai_functions
result1.message_log
[AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_current_temperature', 'arguments': '{\n  "latitude": 37.7749,\n  "longitude": -122.4194\n}'}})]
format_to_openai_functions([(result1, observation), ])
[AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_current_temperature', 'arguments': '{\n  "latitude": 37.7749,\n  "longitude": -122.4194\n}'}}),
 FunctionMessage(content='The current temperature is 11.0°C', name='get_current_temperature')]

So, if we look at it, it contains the chat message with the extra coordinates, the function call that says obtain current temperature, and then the parameters with the JSON string, the precise response from open AI. So we’re saving that in this message log so that when it comes time to build this agent scratchpad, we can just put it back in there. Then, given the observation, which is just a string, we’ll use the function message type that we discussed in article one and feed that in.

So we’ll call this function, and we’ll call it on a list of two, both of which correspond to the agent action and the observation. And the reason for the list of two is that we add extra steps for the agent to take. We can just keep passing and all of this of two both, and we will keep generating this listed message. So here we have an AI message, and the output is that only tomorrow and making the tape is agent activity. Then there’s this function message with the content observation and the name obtain temperature.

If we then take that and run the chain for the same input again, but this time with agent scratchpad. If we look at where we obtain that from result to agent finish with the output being current temperature, we can see that the temperature is still there. So let’s combine all of that into a function.

result2 = chain.invoke({
    "input": "what is the weather is sf?", 
    "agent_scratchpad": format_to_openai_functions([(result1, observation)])
})
result2
AgentFinish(return_values={'output': 'The current temperature in San Francisco is 11.0°C.'}, log='The current temperature in San Francisco is 11.0°C.')

1.2.2 Building the Agent Loop

So what we’re going to do is develop a run agent function that will accept user input. It will start with an empty list. This is the middle stage, and it will continue for a while. And while it is correct, it will call the chain with this user input and with this agent scratchpad every time we call format on this list of intermediate stages. And we’ll see how that’s put together later. If the outcome is an agent finish, we will return with that result. Otherwise, it indicates an agent action.

So we’ll look for the relevant tool to utilise, which I’ll call with that tool input, and then we’ll return to observation. And then we’ll add that to the whole set of results in observation to intermediate phases and continue on our path.

from langchain.schema.agent import AgentFinish
def run_agent(user_input):
    intermediate_steps = []
    while True:
        result = chain.invoke({
            "input": user_input, 
            "agent_scratchpad": format_to_openai_functions(intermediate_steps)
        })
        if isinstance(result, AgentFinish):
            return result
        tool = {
            "search_wikipedia": search_wikipedia, 
            "get_current_temperature": get_current_temperature,
        }[result.tool]
        observation = tool.run(result.tool_input)
        intermediate_steps.append((result, observation))

1.2.3 Enhancing the Agent Loop

We’ll make one tiny change to this chain before proceeding to make it a genuine agent chain. And basically, we’re going to use this style to open up the function logic and the chain itself. That will make it a little more portable. So it only has to accept user input and intermediate processes. We’ll build this agent chain with a dependable pass through. A dependable pass through just takes the initial input and passes it through. What we’re going to do is insert this assigned variable here.

This assignment requires you to add a new parameter to the dictionary that will act as an agent scratchpad. And it is at this point that we refer to the format to open the air function on intermediate phases. We’re using the pipe, and the end result is a dictionary into the chain. And we’ll obtain a complete end-to-end chain that now includes the input and intermediate steps. This does the necessary intermediate step processing to build the scratchpad, then passes it to the prompt, the model, to the agent.

from langchain.schema.runnable import RunnablePassthrough
agent_chain = RunnablePassthrough.assign(
    agent_scratchpad= lambda x: format_to_openai_functions(x["intermediate_steps"])
) | chain
def run_agent(user_input):
    intermediate_steps = []
    while True:
        result = agent_chain.invoke({
            "input": user_input, 
            "intermediate_steps": intermediate_steps
        })
        if isinstance(result, AgentFinish):
            return result
        tool = {
            "search_wikipedia": search_wikipedia, 
            "get_current_temperature": get_current_temperature,
        }[result.tool]
        observation = tool.run(result.tool_input)
        intermediate_steps.append((result, observation))
run_agent("what is the weather is sf?")
AgentFinish(return_values={'output': 'The current temperature in San Francisco is 11.0°C.'}, log='The current temperature in San Francisco is 11.0°C.')

With this update, we can now make a slightly simpler move here, where you’ll note that we’re simply invoking the agent chain on the input directly. Because pre-processing is now done within the agent chain. Let us now execute this agent. So let’s call it based on what the weather is in asset. We return to the print chain. Let’s go on to another question.

What exactly is Langchain? We can see that we get a response when we run the agent now. The chain is then a framework designed to simplify the development of LLM based applications.

run_agent("what is langchain?")
AgentFinish(return_values={'output': 'LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It is a language model integration framework that can be used for various tasks such as document analysis and summarization, chatbots, and code analysis. LangChain helps developers leverage the power of language models in their applications.'}, log='LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It is a language model integration framework that can be used for various tasks such as document analysis and summarization, chatbots, and code analysis. LangChain helps developers leverage the power of language models in their applications.')
run_agent("hi!")
AgentFinish(return_values={'output': 'Hello! How can I assist you today?'}, log='Hello! How can I assist you today?')
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent_chain, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is langchain?"})


> Entering new AgentExecutor chain...

Invoking: `search_wikipedia` with `{'query': 'langchain'}`


Page: LangChain
Summary: LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.

Page: Prompt engineering
Summary: Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.A prompt for a text-to-text model can be a query such as "what is Fermat's little theorem?", a command such as "write a poem about leaves falling", a short statement of feedback (for example, "too verbose", "too formal", "rephrase again", "omit this word") or a longer statement including context, instructions, and input data. Prompt engineering may involve phrasing a query, specifying a style, providing relevant context or assigning a role to the AI such as "Act as a native French speaker". A prompt may include a few examples for a model to learn from, such as "maison -> house, chat -> cat, chien -> dog", an approach called few-shot learning.When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse" or "Lo-fi slow BPM electro chill with organic samples". Prompting a text-to-image model may involve adding, removing, emphasizing and re-ordering words to achieve a desired subject, style, layout, lighting, and aesthetic.



Page: Sentence embedding
Summary: In natural language processing, a sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information.State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset. 
Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions. Though this has been shown to achieve worse performance than approaches such as InferSent or SBERT. 
An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW). However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE), which demonstrated performance improvements in downstream text classification tasks.LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It is a language model integration framework that can be used for various tasks such as document analysis and summarization, chatbots, and code analysis. LangChain helps developers leverage the power of language models in their applications.

> Finished chain.
{'input': 'what is langchain?',
 'output': 'LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It is a language model integration framework that can be used for various tasks such as document analysis and summarization, chatbots, and code analysis. LangChain helps developers leverage the power of language models in their applications.'}
agent_executor.invoke({"input": "my name is bob"})


> Entering new AgentExecutor chain...
Hello Bob! How can I assist you today?

> Finished chain.
{'input': 'my name is bob', 'output': 'Hello Bob! How can I assist you today?'}
agent_executor.invoke({"input": "what is my name"})


> Entering new AgentExecutor chain...
I'm sorry, but I don't have access to personal information.

> Finished chain.
{'input': 'what is my name',
 'output': "I'm sorry, but I don't have access to personal information."}

Agents also includes error handling for tools. So, if you call it tool and the tool errors, it will take care of it wonderfully. We catch that error. We return it to the language model. That is the proper answer we want - in this example when it doesn;t know the name. We may easily use this by importing an agent executor from the language. This can then be initialised with agent equals the agent chain. We pass in the tools because it requires them to be able to call it. Then we pass those tools in for some nice logging. We can view some logs of what’s going on if we now call this with verbose set to True.

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful but sassy assistant"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])
agent_chain = RunnablePassthrough.assign(
    agent_scratchpad= lambda x: format_to_openai_functions(x["intermediate_steps"])
) | prompt | model | OpenAIFunctionsAgentOutputParser()
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(return_messages=True,memory_key="chat_history")
agent_executor = AgentExecutor(agent=agent_chain, tools=tools, verbose=True, memory=memory)
agent_executor.invoke({"input": "my name is bob"})


> Entering new AgentExecutor chain...
Hello Bob! How can I assist you today?

> Finished chain.
{'input': 'my name is bob',
 'chat_history': [HumanMessage(content='my name is bob'),
  AIMessage(content='Hello Bob! How can I assist you today?')],
 'output': 'Hello Bob! How can I assist you today?'}
agent_executor.invoke({"input": "whats my name"})


> Entering new AgentExecutor chain...
Your name is Bob.

> Finished chain.
{'input': 'whats my name',
 'chat_history': [HumanMessage(content='my name is bob'),
  AIMessage(content='Hello Bob! How can I assist you today?'),
  HumanMessage(content='whats my name'),
  AIMessage(content='Your name is Bob.')],
 'output': 'Your name is Bob.'}
agent_executor.invoke({"input": "whats the weather in sf?"})


> Entering new AgentExecutor chain...

Invoking: `get_current_temperature` with `{'latitude': 37.7749, 'longitude': -122.4194}`


The current temperature is 11.0°CThe current temperature in San Francisco is 11.0°C.

> Finished chain.
{'input': 'whats the weather in sf?',
 'chat_history': [HumanMessage(content='my name is bob'),
  AIMessage(content='Hello Bob! How can I assist you today?'),
  HumanMessage(content='whats my name'),
  AIMessage(content='Your name is Bob.'),
  HumanMessage(content='whats the weather in sf?'),
  AIMessage(content='The current temperature in San Francisco is 11.0°C.')],
 'output': 'The current temperature in San Francisco is 11.0°C.'}

This is the result that we get back as part of the agent chain. It’s very similar to a lot of what’s powering OpenAI’s ChatGPT. So if you’re interacted with a code interpreter there or the plug-in system there, you’ll notice that the language model often thinks about what to do. Then it calls a tool that gets back the response, then it calls another one, they get back the response from that, and then it returns to the user. So this is a really powerful loop and technique that can be used to create something like ChatGPT.

1.3 Advanced Agent Loop Techniques

Let’s include another tool. We’re planning to update our tool list. So we’ll have the same or temperature tool, as well as the same Wikipedia search engine. Then there’s this new tool that we made which just returns the reversed input query.

@tool
def create_your_own(query: str) -> str:
    """This function can do whatever you would like once you fill it in """
    print(type(query))
    return query[::-1]
tools = [get_current_temperature, search_wikipedia, create_your_own]

So, taking a quick glance at this, we can see that we have our functions, which are calling the same function as previously, format tool to place the new functions. We have our model, which is the chat open and I model, and we’ll be adding functions to it. We have memory, which is the same as before. We have our prompt, which is the same as before. So it has a system message, a conversation history, user input, and a scratchpad for the agent.

Then, as before, we build the agent chain, and finally, as before, we build the agent-executor.

import panel as pn  # GUI
pn.extension()
import panel as pn
import param

class cbfs(param.Parameterized):
    
    def __init__(self, tools, **params):
        super(cbfs, self).__init__( **params)
        self.panels = []
        self.functions = [format_tool_to_openai_function(f) for f in tools]
        self.model = ChatOpenAI(temperature=0).bind(functions=self.functions)
        self.memory = ConversationBufferMemory(return_messages=True,memory_key="chat_history")
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", "You are helpful but sassy assistant"),
            MessagesPlaceholder(variable_name="chat_history"),
            ("user", "{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad")
        ])
        self.chain = RunnablePassthrough.assign(
            agent_scratchpad = lambda x: format_to_openai_functions(x["intermediate_steps"])
        ) | self.prompt | self.model | OpenAIFunctionsAgentOutputParser()
        self.qa = AgentExecutor(agent=self.chain, tools=tools, verbose=False, memory=self.memory)
    
    def convchain(self, query):
        if not query:
            return
        inp.value = ''
        result = self.qa.invoke({"input": query})
        self.answer = result['output'] 
        self.panels.extend([
            pn.Row('User:', pn.pane.Markdown(query, width=450)),
            pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=450, styles={'background-color': '#F6F6F6'}))
        ])
        return pn.WidgetBox(*self.panels, scroll=True)


    def clr_history(self,count=0):
        self.chat_history = []
        return 
cb = cbfs(tools)

inp = pn.widgets.TextInput( placeholder='Enter text here…')

conversation = pn.bind(cb.convchain, inp) 

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation,  loading_indicator=True, height=400),
    pn.layout.Divider(),
)

dashboard = pn.Column(
    pn.Row(pn.pane.Markdown('# QnA_Bot')),
    pn.Tabs(('Conversation', tab1))
)
dashboard

1.4 Conclusion: The Future of Conversational Agents

In conclusion, this exploration into conversational agents reveals a vast potential for creating intelligent, interactive systems based on LLM based agent applications. By understanding the basics, implementing an effective agent loop, and integrating advanced techniques, we can build sophisticated agents capable of dynamic interactions very similar to OpenAI’s ChatGPT. The future of conversational agents is promising, with endless possibilities for innovation and practical applications.

1.5 Acknowledgements

I’d like to express my thanks to the wonderful Functions, Tools and Agents with LangChain by DeepLearning.ai - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Subscribe