LivingDataLab - Evaluating Classification Inputs for Large Language Models

1 Introduction

Large language models such as ChatGPT can generate text responses based on a given prompt or input. Writing prompts allow users to guide the language model’s output by providing a specific context or topic for the response. This feature has many practical applications, such as generating creative writing prompts, assisting in content creation, and even aiding in customer service chatbots.

In earlier articles i’ve looked at how you can use ChatGPT to solve some of these tasks with simple prompts. But in many use cases, what is required is not just one prompt but a sequence of prompts where we need to also consider the outputs at each stage, before providing a final output - for example with a customer service chatbot.

In this article, we will look at how you evaluate classiciation inputs to large language models, which is important when creating LLM applications that involve chains of multiple inputs and outputs to LLMs.

2 Setup

2.1 Load the API key and relevant Python libaries.

First we need to load certain python libs and connect the OpenAi api.

The OpenAi api library needs to be configured with an account’s secret key, which is available on the website.

You can either set it as the OPENAI_API_KEY environment variable before using the library: !export OPENAI_API_KEY='sk-...'

Or, set openai.api_key to its value:

import openai
openai.api_key = "sk-..."

import os
import openai
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

# Define helper function
def get_completion_from_messages(messages, 
                                 model="gpt-3.5-turbo", 
                                 temperature=0, 
                                 max_tokens=500):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
        max_tokens=max_tokens,
    )
    return response.choices[0].message["content"]

3 Classify customer queries to handle different cases

It can be useful to first classify the type of query before using that categorization to decide which instructions to apply for tasks where many independent sets of instructions to LLMs are required to address various instances. This can be accomplished by establishing predefined categories and hard-coding instructions that are pertinent for managing tasks in a certain category. For instance, while designing a customer service assistant, it may be crucial to first categorise the sort of inquiry and then choose the appropriate instructions based on that categorization.

Therefore, if a consumer asks to close their account rather than inquire about a particular product, you might provide alternative secondary instructions. Therefore, you might include further account closure instructions in the first scenario, and you might offer additional product details in the second one. Lets use an example to make it clearer. To do this, we are using a delimiter in our system message, which is an instruction for the entire system. A delimiter is merely a means of dividing several portions of an instruction or output, aiding the model in sort of identifying the various components.So, we’ll utilise the hashtag delimiter in this case. And since it is actually encoded as a single character, this is a handy delimiter.

So, this is the system message and the model request:

‘You will be provided with customer service queries. The customer service query will be delimited with these hashtag characters. Classify each query into a primary category and a secondary category.’

And we provide the output in a JSON format with the keys primary and secondary, and then classify each query into a primary category and a secondary category. Our main categories are therefore included here. So billing, customer service, account management, or general enquiry. Following that comes a list of subcategories, options to upgrade or cancel, etc.

So let’s go ahead with the user message example now. We’ll use the following for our first user message.

‘I want you to delete my profile and all of my user data.’

And after that, we’ll just format this into a series of messages with these hashtags serving as a separator between the system message and the user message. Let’s take a look and speculate as to what this might be. So I want you to delete my profile. This looks like managing accounts, possibly closing accounts. Let’s check the model’s opinion now. Great, so account management is the major category according to the model, and closed accounts is the secondary.

The benefit of requesting a structured output like a JSON is that you can easily read it into an object, such as a dictionary in Python or another type of object if you’re using a different language, and then use it as the input for the next step.

delimiter = "####"
system_message = f"""
You will be provided with customer service queries. \
The customer service query will be delimited with \
{delimiter} characters.
Classify each query into a primary category \
and a secondary category. 
Provide your output in json format with the \
keys: primary and secondary.

Primary categories: Billing, Technical Support, \
Account Management, or General Inquiry.

Billing secondary categories:
Unsubscribe or upgrade
Add a payment method
Explanation for charge
Dispute a charge

Technical Support secondary categories:
General troubleshooting
Device compatibility
Software updates

Account Management secondary categories:
Password reset
Update personal information
Close account
Account security

General Inquiry secondary categories:
Product information
Pricing
Feedback
Speak to a human

"""
user_message = f"""\
I want you to delete my profile and all of my user data"""
messages =  [  
{'role':'system', 
 'content': system_message},    
{'role':'user', 
 'content': f"{delimiter}{user_message}{delimiter}"},  
] 
response = get_completion_from_messages(messages)
print(response)

{
  "primary": "Account Management",
  "secondary": "Close account"
}

So here’s another user message that reads, “Tell me more about your flat screen TVs.” We simply have the same messages list, the model’s response, and then we’ll print it. Here is our second classification, which appears to be accurate.

user_message = f"""\
Tell me more about your flat screen tvs"""
messages =  [  
{'role':'system', 
 'content': system_message},    
{'role':'user', 
 'content': f"{delimiter}{user_message}{delimiter}"},  
] 
response = get_completion_from_messages(messages)
print(response)

{
  "primary": "General Inquiry",
  "secondary": "Product information"
}

So in general, based on the classification of the client query, we can now offer a set of more detailed instructions to manage the next phases.

Instead of providing a link to cancelling the account or anything similar, we might include a little extra information on the TVs in this example.

4 Acknowledgements

I’d like to express my thanks to the wonderful Building Systems with the ChatGPT API Course by DeepLearning.ai and OpenAI - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Evaluating Classification Inputs for Large Language Models

Subscribe