Multi-task Instruction Fine-Tuning for Large Language Models

In this post, we’ll look at techniques you might employ to make an existing large language model more effective for your particular use case using a method called instruction fine-tuning, and in particular see how this can be used to optimise for multiple tasks as the same time.

Pranath Fernando


July 11, 2023

1 Introduction

In this post, we’ll look at techniques you might employ to make an existing large language model more effective for your particular use case using a method called instruction fine-tuning, and in particular see how this can be used to optimise for multiple tasks as the same time.

2 Multi-Task Instruction Fine-Tuning

An extension of single task fine-tuning, multitask fine-tuning uses sample inputs and outputs for multiple tasks as part of the training dataset. Here, the dataset includes examples that teach the model how to perform a number of tasks, including entity recognition, code translation, summarization, and review rating. To increase the model’s performance across all tasks simultaneously and prevent catastrophic forgetting, you train the model using this mixed dataset. The weights of the model are updated throughout multiple training epochs using the computed losses across samples, creating a tailored model that has learned how to be proficient at several tasks at once.

The fact that multitask fine-tuning needs a lot of data is one disadvantage. In your training set, you can need as many as 50–100,000 samples. However, compiling this data can be quite beneficial and worthwhile. The resulting models are frequently quite powerful and appropriate for application in circumstances where good performance across a wide range of activities is desired. Let’s look at a particular model family that was trained utilising multitask instruction fine-tuning. Depending on the datasets and tasks utilised during fine-tuning, instruct model variance varies.

3 Instruction Fine-Tuning with FLAN

The FLAN family of models is one instance. FLAN, or fine-tuned language net, is a particular set of guidelines used to fine-tune various models. The authors of the original research referred to FLAN fine-tuning as the metaphorical dessert to the main dish of pre-training because it is the final stage of the training process, which is a fair description.

In contrast to FLAN-PALM, which is the flattening struct version of the palm foundation model, FLAN-T5 is the FLAN instruct version of the T5 foundation model. You get the picture; the FLAN-T5 is a fantastic all-purpose teaching model. It has been improved on 473 datasets total, spanning 146 different task categories.

SAMSum is one illustration of a prompt dataset used for FLAN-T5 summarising jobs. It is used to train language models that will summarise discourse. It is a component of the muffin collection of tasks and datasets. SAMSum is a dataset that contains summaries of 16,000 messenger-like chats. The dialogue for three cases is on the left, and the summaries are on the right. Linguists created the summaries and dialogues specifically to create a high-quality training dataset for language models. The linguists were asked to construct dialogues resembling those they would write every day, reflecting the percentage of themes of their actual messenger exchanges.

Although language experts then created short summaries of those conversations that included important pieces of information and names of the people in the dialogue. Here is a prompt template designed to work with this SAMSum dialogue summary dataset. The template is actually comprised of several different instructions that all basically ask the model to do this same thing. Summarize a dialogue. For example, briefly summarize that dialogue. What is a summary of this dialogue? What was going on in that conversation? Including different ways of saying the same instruction helps the model generalize and perform better. Just like the prompt templates you saw earlier.

You can see that wherever the dialogue field appears in each scenario, the dialogue from the SAMSum dataset is added to the template. The label is created using the summary. You can use this template to fine-tune a dialogue summarization task by applying it to each row in the SAMSum dataset.

Even while the FLAN-T5 is a fantastic general-use model, it struggles at times. It still might need work on certain activities for your particular use case, so keep that in mind.

4 Use-Case: Summarising Customer Requests

Consider, for instance, that you are a data scientist developing an app to assist your customer care staff in handling requests made through a chat bot like the one displayed here.

To identify the main steps the client is seeking and to decide what actions should be taken in response, your customer care staff needs a summary of every conversation. FLAN-T5 has some conversation-summarizing capabilities thanks to the SAMSum dataset. The examples in the dataset, which primarily consist of talks between friends about everyday activities, don’t significantly resemble the language pattern found in chats with customer service representatives. Use a dialogue dataset that is considerably more similar to the discussions that took place with your bot to further fine-tune the FLAN-T5 model.

Let’s examine an illustration from dialogsum and talk about how more fine-tuning can make the model better. This support chat is representative of the dialogsum dataset’s sample examples. A customer and a service member are speaking at a hotel check-in desk. A template has been applied to the chat so that the direction to start the text with a summary of the conversation is present.

Before performing any further fine-tuning, let’s have a look at how FLAN-T5 answers to this question. You’ll see that the prompt has been compressed to the left to give you more room to analyse the model’s completion.

This is how the model responded to the directive. You can tell that the model works because it was able to determine that Tommy’s reservation was the topic of the conversation. However, it falls short of the human-generated baseline summary, which also includes crucial details like Mike’s request for information to speed up check-in and the models’ creation of information that was not part of the original dialogue. In particular, the brand of the hotel and the location where it was situated. Let’s examine the model’s performance after making adjustments based on the dialogue dataset. We think you will agree that this is closer to a human-produced summary.

The summary has all the crucial information, including the names of both participants in the conversation, and there is no faked information. This example shows how to fine-tune custom data using a dataset and a public discussion. Using the internal data collected by your business can help you fine-tune your strategy the most effectively.

As an illustration, consider the support chat exchanges from your customer service application. This will enable the model to pick up on the nuances of how your business prefers to sum up discussions and what information your customer service colleagues find most helpful.

This paper introduces FLAN (Fine-tuned LAnguage Net), a technique for fine-tuning instructions, and demonstrates the outcomes of its use. The study shows that by optimising the 540B PaLM model on 1836 tasks and including Chain-of-Thought Reasoning data, FLAN provides improvements in generalisation, human usability, and zero-shot reasoning over the original model. The article also includes comprehensive information on the evaluation process for each of these factors

5 Acknowledgements

I’d like to express my thanks to the wonderful Generative AI with Large Language Models Course by and AWS - which i completed, and acknowledge the use of some images and other materials from the course in this article.