LivingDataLab - Using Chain of Thought Reasoning with ChatGPT

1 Introduction

Large language models such as ChatGPT can generate text responses based on a given prompt or input. Writing prompts allow users to guide the language model’s output by providing a specific context or topic for the response. This feature has many practical applications, such as generating creative writing prompts, assisting in content creation, and even aiding in customer service chatbots.

In earlier articles i’ve looked at how you can use ChatGPT to solve some of these tasks with simple prompts. But in many use cases, what is required is not just one prompt but a sequence of prompts where we need to also consider the outputs at each stage, before providing a final output - for example with a customer service chatbot.

In this article we will focus on large language model tasks to process a series of inputs i.e. the tasks that take the input and generate a useful output often through a series of steps - using ChatGPT.

2 Setup

2.1 Load the API key and relevant Python libaries.

First we need to load certain python libs and connect the OpenAi api.

The OpenAi api library needs to be configured with an account’s secret key, which is available on the website.

You can either set it as the OPENAI_API_KEY environment variable before using the library: !export OPENAI_API_KEY='sk-...'

Or, set openai.api_key to its value:

import openai
openai.api_key = "sk-..."

import os
import openai
import sys
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

openai.api_key  = os.environ['OPENAI_API_KEY']

# Define helper function
def get_completion_from_messages(messages, 
                                 model="gpt-3.5-turbo", 
                                 temperature=0, max_tokens=500):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
        max_tokens=max_tokens, 
    )
    return response.choices[0].message["content"]

3 Chain-of-Thought Prompting

As we have seen in earlier articles of mine, an LLM may occasionally find it necessary to reason in depth about an issue before responding to a particular inquiry. In order to allow the model to consider the issue more thoroughly and for a longer period of time before providing a final answer, we can reframe the question to ask for a sequence of pertinent reasoning steps before the model responds. This way, the model will be less likely to make reasoning mistakes by jumping to the wrong conclusion.

And in general, we call this strategy of asking the model to reason about a problem in steps, “Chain of Thought Reasoning”.

For some applications, it would be improper to reveal to the user the thought process a model goes through to arrive at a solution. In tutoring applications, for instance, we might want to encourage students to work on their own solutions, but a model’s analysis of the student’s solution may give away the solution to the student. Inner monologue, which is merely a fancy way of stating concealing the model’s thinking from the user, is a strategy that can be employed to mitigate this.

The purpose of inner monologue is to give the model instructions on how to organise elements of the output that are intended to be passed easily but should be hidden from the user.

The output is then passed and only a portion of it is made available to the user before the output is finally presented to them. So keep in mind the classification issue from a previous article when we requested the model to divide a client question into a primary and secondary group. Therefore, we might wish to follow different steps depending on that classification.

4 Use-case: A customer enquiry

Imagine that the customer’s inquiry had been placed under the category for product information. We’ll want to give details about the things we offer in the following instructions. Therefore, the classification in this instance would have been main, general inquiry, secondary, and product information. So, starting there, let’s look at an example.

Let’s now review the message from our system. So, what we’re asking the model to do here is consider the answer before drawing a conclusion.

So, the instruction is:

“Follow these steps to answer the customer queries. The customer query will be delimited with four hashtags.”

So, then we’ve split this up into steps.

So, the first step is to:

“Decide whether the user is asking a question about a specific product or products. And a product category doesn’t count.”.

Step two:

“If the user is asking about specific products, identify whether the products are in the following list.”.

And now we’ve included a list of available products. So, here we have five available products. They’re all varieties of laptops and these are all made up products. They were actually generated by GPT-4.

And step three:

“If the message contains products in the list above, list any assumptions that the user is making in their message. For example, that laptop X is bigger than laptop Y or that laptop Z has a 2 year warranty.”

Step four is:

“If the user made any assumptions, figure out whether the assumption is true based on your product information.”.

And step five is:

“First, politely correct the customer’s incorrect assumptions, if applicable. Only mention or reference products in the list of 5 available products, as these are the only five products that the store sells. And answer the customer in a friendly tone.”

And these kind of very pedantic instructions are probably unnecessary for a more advanced language model like GPT-4. Then we’ll ask the model to use the following format. So, step one, delimiter, it’s reasoning. Step two, delimiter, reasoning and so on.

Using the delimiters will mean that it will be easier for us later to get just this response to the customer, and kind of cut off everything before.

So, now let’s try an example user message. So, our message is:

“by how much is the BlueWave Chromebook more expensive than the TechPro desktop?”

So, let’s take a look at these two products. The BlueWave Chromebook is 249.99. And the TechPro desktop is actually 999.99. This is not actually true. And so, let’s see how the model handles this user request.

delimiter = "####"
system_message = f"""
Follow these steps to answer the customer queries.
The customer query will be delimited with four hashtags,\
i.e. {delimiter}. 

Step 1:{delimiter} First decide whether the user is \
asking a question about a specific product or products. \
Product cateogry doesn't count. 

Step 2:{delimiter} If the user is asking about \
specific products, identify whether \
the products are in the following list.
All available products: 
1. Product: TechPro Ultrabook
   Category: Computers and Laptops
   Brand: TechPro
   Model Number: TP-UB100
   Warranty: 1 year
   Rating: 4.5
   Features: 13.3-inch display, 8GB RAM, 256GB SSD, Intel Core i5 processor
   Description: A sleek and lightweight ultrabook for everyday use.
   Price: $799.99

2. Product: BlueWave Gaming Laptop
   Category: Computers and Laptops
   Brand: BlueWave
   Model Number: BW-GL200
   Warranty: 2 years
   Rating: 4.7
   Features: 15.6-inch display, 16GB RAM, 512GB SSD, NVIDIA GeForce RTX 3060
   Description: A high-performance gaming laptop for an immersive experience.
   Price: $1199.99

3. Product: PowerLite Convertible
   Category: Computers and Laptops
   Brand: PowerLite
   Model Number: PL-CV300
   Warranty: 1 year
   Rating: 4.3
   Features: 14-inch touchscreen, 8GB RAM, 256GB SSD, 360-degree hinge
   Description: A versatile convertible laptop with a responsive touchscreen.
   Price: $699.99

4. Product: TechPro Desktop
   Category: Computers and Laptops
   Brand: TechPro
   Model Number: TP-DT500
   Warranty: 1 year
   Rating: 4.4
   Features: Intel Core i7 processor, 16GB RAM, 1TB HDD, NVIDIA GeForce GTX 1660
   Description: A powerful desktop computer for work and play.
   Price: $999.99

5. Product: BlueWave Chromebook
   Category: Computers and Laptops
   Brand: BlueWave
   Model Number: BW-CB100
   Warranty: 1 year
   Rating: 4.1
   Features: 11.6-inch display, 4GB RAM, 32GB eMMC, Chrome OS
   Description: A compact and affordable Chromebook for everyday tasks.
   Price: $249.99

Step 3:{delimiter} If the message contains products \
in the list above, list any assumptions that the \
user is making in their \
message e.g. that Laptop X is bigger than \
Laptop Y, or that Laptop Z has a 2 year warranty.

Step 4:{delimiter}: If the user made any assumptions, \
figure out whether the assumption is true based on your \
product information. 

Step 5:{delimiter}: First, politely correct the \
customer's incorrect assumptions if applicable. \
Only mention or reference products in the list of \
5 available products, as these are the only 5 \
products that the store sells. \
Answer the customer in a friendly tone.

Use the following format:
Step 1:{delimiter} <step 1 reasoning>
Step 2:{delimiter} <step 2 reasoning>
Step 3:{delimiter} <step 3 reasoning>
Step 4:{delimiter} <step 4 reasoning>
Response to user:{delimiter} <response to customer>

Make sure to include {delimiter} to separate every step.
"""

We will now format our messages array. And we will receive a response and we’ll print it after that. We are hoping that after going through all of these phases, the model will recognise that the user has made a false assumption and then proceed to the final step to correct the user. So, we’ve actually kept track of a variety of complex states that the system might be in inside the confines of this one request. We also know the output from the preceding phase could change at any time, and we would want to take a different action. We wouldn’t have any output in step 4 for instance, if the user hadn’t made any assumptions in step 3.

So this is a pretty complicated instruction for the model. So let’s see if it did it right.

user_message = f"""
by how much is the BlueWave Chromebook more expensive \
than the TechPro Desktop"""

messages =  [  
{'role':'system', 
 'content': system_message},    
{'role':'user', 
 'content': f"{delimiter}{user_message}{delimiter}"},  
] 

response = get_completion_from_messages(messages)
print(response)

Step 1:#### The user is asking a question about two specific products, the BlueWave Chromebook and the TechPro Desktop.
Step 2:#### The prices of the two products are as follows:
- BlueWave Chromebook: $249.99
- TechPro Desktop: $999.99
Step 3:#### The user is assuming that the BlueWave Chromebook is more expensive than the TechPro Desktop.
Step 4:#### The assumption is incorrect. The TechPro Desktop is actually more expensive than the BlueWave Chromebook.
Response to user:#### The BlueWave Chromebook is actually less expensive than the TechPro Desktop. The BlueWave Chromebook costs $249.99, while the TechPro Desktop costs $999.99.

In the first stage, the user is requesting information about particular products. They want to know how much these two products cost different from one another.

It is wrong for the user to infer that the BlueWave Chromebook is more expensive than the TechBook Pro. The process of thinking through the issue is taking longer. The model performs better if it has time to reflect, much as a human would spend some time to consider an answer to any given topic. The BlueWave Chromebook is really less expensive than the TechBook Pro, which is the user’s final comment. The BlueWave Chromebook is $249.99 cheaper than the TechBook Pro desktop, which costs $999.99.

So let’s look at yet another user message sample.

So, let’s format this user message. The query is:

“Do you sell TVs?”

And if you recall, we merely included various PCs in our product list. Let’s see what the model suggests. Since TVs aren’t listed among the products available, the user in this scenario is enquiring if the store sells them in step one.

user_message = f"""
do you sell tvs"""
messages =  [  
{'role':'system', 
 'content': system_message},    
{'role':'user', 
 'content': f"{delimiter}{user_message}{delimiter}"},  
] 
response = get_completion_from_messages(messages)
print(response)

Step 1:#### The user is asking about a specific product category, TVs.

Step 2:#### The list of available products does not include any TVs.

Response to user:#### I'm sorry, but we do not sell TVs at this time. Our store specializes in computers and laptops. However, if you are interested in purchasing a computer or laptop, please let me know and I would be happy to assist you.

As you can see, the model then proceeds directly to the phase where it responds to the user after realising that the intermediary processes are not truly required. We did request the output in this particular format. So, in a strict sense, the model has not complied with our request. Once more, more sophisticated models like GPT4 will be more adept at doing that. We apologise, but we do not sell TVs at the store, is our response to the user in this instance.

The products that are available are then listed. And so now, we only really want this part of the response.

5 Inner Monologue

The user wouldn’t want to see the earlier sections. In order to only print the final portion of the model output, we can simply cut the string at the last instance of this delimiter token or string of four hashtags. So, let’s develop some code to only retrieve the last portion of this text. So, we’ll use a try unless block.

In case the model produces unexpected results and doesn’t actually need these characters, to gracefully handle failures. We will state that our final response is the response before splitting the string at the delimiter string.

We only want the final item in the output list because we are looking for the last instance, after that, we’ll remove any blank space - since the characters may be followed by white space, as you can see. Then we’re going to catch any errors and have a fallback response which is, “Sorry, I’m having trouble right now. Please try asking another question.”.

Since we asked the LLM to separate its reasoning steps by a delimiter, we can hide the chain-of-thought reasoning from the final output that the user sees.

try:
    final_response = response.split(delimiter)[-1].strip()
except Exception as e:
    final_response = "Sorry, I'm having trouble right now, please try asking another question."
    
print(final_response)

I'm sorry, but we do not sell TVs at this time. Our store specializes in computers and laptops. However, if you are interested in purchasing a computer or laptop, please let me know and I would be happy to assist you.

As you can see, we just cut the string to produce this output. And so, if we were to incorporate this into an application, this is what the user would see. Also, this task’s request may be a little too complicated overall. It’s possible that none of these intermediary stages are actually necessary. And in general, considerable trial and error testing is needed to identify the best trade-off in prompt complexity.

So, it is wise to experiment with a variety of prompts before choosing one.

6 Acknowledgements

I’d like to express my thanks to the wonderful Building Systems with the ChatGPT API Course by DeepLearning.ai and OpenAI - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Using Chain of Thought Reasoning with ChatGPT

Subscribe