A High Level Overview of Prompting and In-Context Learning for Large Language Models

Here we will take a high level non-technical view of what prompting is all about and introduce in-context learning
natural-language-processing
deep-learning
prompt-engineering
Author

Pranath Fernando

Published

July 2, 2023

1 Introduction

Recent advances in AI such as ChatGPT have demonstrated impressive abilities for performing a wide range of tasks previously only done by humans, these are known as Large Language Models (LLM’s). One key aspect for using these LLM’s is called prompting - which is how to write text requests to get the outputs you want from these models. In previous articles I’ve looked at detailed practical use cases for how to make these prompts. In this article we will take a high level non-technical view of what prompting is all about, and introduce what in-context learning is.

2 LLM Prompting Basic Concepts

The text that you feed into the model is called the prompt, the act of generating text is known as inference, and the output text is known as the completion. The full amount of text or the memory that is available to use for the prompt is called the context window.

Although the model performed well in this case, you’ll regularly get across circumstances when it didn’t deliver the results you were hoping for right away. To get the model to behave as you want, you might need to make multiple revisions to the language or format of your request. The process of creating and enhancing the prompt is referred to as prompt engineering. It’s a broad subject. However, putting examples of the tasks you want the model to complete inside the prompt is a good way to get it to deliver better results.

In-context learning refers to giving examples inside the context window. Let’s examine what this means. By incorporating examples or more information in the prompt, you can aid LLMs in learning more about the activity being asked by using in-context learning. Here is an actual case in point. You ask the model to categorise a review’s sentiment in the given prompt. Consequently, regardless of whether the review of this film is favourable or negative, the prompt consists of the instruction, “Classify this review,” followed by some context, in this case the review text itself, and a directive to produce the sentiment at the conclusion.

Zero-shot inference is the name of this technique, where you directly ask your question without giving any examples.

The larger LLMs are quite adept doing this, understanding the problem at hand and providing a useful response. In this case, the emotion is appropriately classified as positive by the model. On the other hand, smaller models may have difficulty with this. When the question includes an example, performance can be improved.

Here, you can see that the prompt is longer and now begins with a finished example that illustrates the actions that must be taken in order for the model to function - this is one-shot inference. The prompt language follows the instruction that the model should categorise the review with a sample review. In this instance, the review is positive. The prompt then repeats the instruction and lists the input review that we want the model to examine. The smaller model now has a higher chance of understanding the task you’re describing and the format of the response you desire when you feed it this new, longer prompt.

In contrast to the zero-shot prompt you provided earlier, one-shot inference involves the use of just one sample. Sometimes the model won’t be able to learn what you want it to perform from only one example. So, you can expand on the concept of offering one example to include several examples. Few-shot inference is the term for this. Here, you’re dealing with a model that’s even smaller and couldn’t successfully perform one-shot inference sentiment analysis. Instead, you’ll use a second example to test few-shot inference. This time, the model can learn what it has to do by receiving a critical review that includes a variety of cases with various output classes.

You provide the model the updated prompts. And this time, it comprehends the directive and produces a completion that accurately denotes the review’s feeling as negative.

In conclusion, you can design your prompts so that the model is encouraged to learn from examples. Smaller models can benefit from one-shot or few-shot inference that includes instances of the intended behaviour, but the larger models are good at zero-shot inference with no examples. You have a limit on how much in-context learning you can pass into the model, so keep in mind the context window.

Generally speaking, you should consider fine-tuning your model if you discover that it isn’t functioning well when, say, incorporating five or six samples. The model is further trained using fresh data during fine-tuning to increase its capacity for the task at hand.

As ever-larger models have been created, it has become clear that a model’s scale has a significant impact on both the model’s ability to do numerous tasks and the quality of those jobs. Models with more parameters can capture more in terms of linguistic knowledge. The larger models are very adept at zero-shot inference and are able to infer and finish a variety of tasks that they were not originally trained to execute. Smaller models, in comparison, typically only excel at a few tasks.

To identify the best model for your use case, you might need to test out a few. Following the discovery of the model that best suits your needs, you can play with a few options to change the model’s output’s using these prompting methods.

3 Acknowledgements

I’d like to express my thanks to the wonderful Generative AI with Large Language Models Course by DeepLearning.ai and AWS - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Subscribe