Large Language Models v Chat Models

In LangChain LLMs and Chat Models are two different kinds of models that are used for various tasks involving natural language processing - the distinctions between LLMs and Chat Models as well as their distinctive applications and implementation strategies within LangChain will be covered in this article
natural-language-processing
deep-learning
langchain
activeloop
openai
Author

Pranath Fernando

Published

July 28, 2023

1 Introduction

Large Language Models have significantly advanced Natural Language Processing (NLP), allowing AI systems to comprehend and produce prose that is human-like. Based on the Transformers architecture, ChatGPT is a well-known language model that can comprehend lengthy texts and determine the relationships between words or concepts. It excels in predicting linguistic patterns and word associations.

In LangChain, LLMs and Chat Models are two different kinds of models that are used for various tasks involving natural language processing. The distinctions between LLMs and Chat Models, as well as their distinctive applications and implementation strategies within LangChain, will be covered in this article.

2 Import Libs & Setup

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

3 Understanding LLMs and Chat Models

3.1 LLMs

Text strings are inputted into LLMs like GPT-3, Bloom, PaLM, and Aurora genAI, and text strings are outputted in return. They can produce prose that is human-like, engage in complicated reasoning, and even write code since they have been taught on language modelling tasks. LLMs are strong and adaptable, able to produce text for a variety of jobs. They occasionally give out inaccurate or meaningless responses, and their API is less organised than Chat Models.

By exposing these models to massive corpora and letting them anticipate the subsequent word, pre-training these models teaches them the links between words. LLMs can produce high-quality text through this learning process, which can be used for a variety of applications, including predictive text and automatic form filling.

Some of these models are trained on a combination of general and domain-specific data, such Intel Aurora genAI, which is trained on general text, scientific texts, scientific data, and domain-related codes. The majority of these models are trained on general purpose training datasets. Increasing performance in a given area while still being able to complete the vast majority of jobs that general LLMs can handle is the aim of domain-specific LLMs.

LLMs have the potential to permeate many facets of human existence, including as the humanities, sciences, and law. LLMs are a crucial technology to master as they continue to be developed and integrated into our academic, social, and professional lives.

You can use a large language model (LLM) like GPT-3 in LangChain by following these instructions. Initialise the OpenAICopy wrapper with the desired model name and any other arguments by importing it from the langchain.llmsCopy module. Set a high temperature, for instance, to produce more random outcomes. After that, make a PromptTemplateCopy to format the model’s input.

To merge the model and prompt, define an LLMChainCopy. Run the chain while using the desired input.run()Copy. As previously noted, before running the following instructions, make sure to set your OpenAI key stored in the “OPENAI_API_KEY” environment variable. Keep in mind to use the following command to install the necessary packages: pip install deeplake openai tiktokenCopy langchain==0.0.208.

llm = OpenAI(model_name="text-davinci-003", temperature=0)
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)
result = chain.run("wireless headphones")
print(result)


Wireless Audio Solutions

Here, the input for the chain is the string “wireless headphones”. The chain processes the input and generates a result based on the product name.

3.2 Chat Models

The most well-liked models in LangChain are chat models, such ChatGPT, which can have either GPT-3 or GPT-4 as its heart. Their capacity to learn from user comments and user-friendly chat interface have attracted a lot of attention.

A list of messages is provided as input to chat models like ChatGPT, which return an AIMessageCopy. Their APIs are more formal, but their underlying technology is typically LLMs. Chat models are made to keep track of the user’s previous conversations throughout a session and use that context to produce more pertinent responses. Additionally, they gain from reinforcement learning from human feedback, which helps them respond more effectively. They may still have logical constraints, though, and need to be handled carefully to prevent producing offensive content.

Types of Chat Messages

When dealing with chat models in LangChain, SystemMessageCopy, HumanMessageCopy, and AIMessageCopy are the three primary message types used.

  • SystemMessage: These messages give the AI model its first directives, context, or information. They determine the goals the AI should pursue and can aid in regulating the AI’s behaviour. System messages are directives for the AI to follow rather than user input.
  • HumanMessage: The user’s input is represented by these messages, which are forwarded to the AI model. It is anticipated that the AI model will react to these messages. To alter how the human input is displayed in LangChain, you can customise the human prefix (for example, “User”) in the discussion summary.
  • AIMessage: During interactions with human users, these messages are sent from the AI’s point of view. They serve as the AI’s answers to input from humans. Similar to HumanMessage, the AI prefix (such as “AI Assistant” or “AI”) in the discussion summary can be customised to alter how the AI’s responses are shown.

An illustration of how to use a HumanMessage with ChatOpenAI

Here, we’re attempting to build a chatbot that can translate a text using the LangChain library. Instead of depending on a single prompt, we’ll use a variety of message kinds to distinguish between users’ requests and system instructions. The model will better understand the requirements by using this method.

We first generate a list of messages, beginning with a SystemMessageCopy that establishes the chatbot’s context and informs it that its purpose is to serve as a helpful translator aid. The user’s inquiry is then placed below it in a HumanMessageCopy, which is similar to an English sentence that needs to be translated.

from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    HumanMessage,
    SystemMessage
)

chat = ChatOpenAI(model_name="gpt-4", temperature=0)
AIMessage(content="J'aime la programmation.", additional_kwargs={}, example=False)
messages = [
    SystemMessage(content="You are a helpful assistant that translates English to French."),
    HumanMessage(content="Translate the following sentence: I love programming.")
]

chat(messages)
AIMessage(content="J'aime la programmation.", additional_kwargs={}, example=False)

As you can see, we use the chat() method to send the chatbot a list of messages. After analysing the input messages and taking into account the context offered by the system message, the chatbot converts the given English sentence into French.

SystemMessage is a representation of the messages that the system that wants to use the model generates, such as warnings, notifications, or errors. These messages are produced by the underlying system to offer context, instructions, or status updates and are not created by the human user or the AI chatbot.

You may also generate completions for other groups of messages using the generate function. Every batch of messages can function independently and have an own SystemMessageCopy. The first set of messages in the code below displays the sentences from to, whereas the second set does the opposite.

batch_messages = [
    [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="Translate the following sentence: I love programming.")
    ],
    [
        SystemMessage(content="You are a helpful assistant that translates French to English."),
        HumanMessage(content="Translate the following sentence: J'aime la programmation.")
    ],
]
result = chat.generate(batch_messages)
result
LLMResult(generations=[[ChatGeneration(text="J'aime la programmation.", generation_info=None, message=AIMessage(content="J'aime la programmation.", additional_kwargs={}, example=False))], [ChatGeneration(text='I love programming.', generation_info=None, message=AIMessage(content='I love programming.', additional_kwargs={}, example=False))]], llm_output={'token_usage': {'prompt_tokens': 65, 'completion_tokens': 11, 'total_tokens': 76}, 'model_name': 'gpt-4'})

As a comparison, here’s what LLM and Chat Model APIs look like in LangChain.

llm_input = "Translate the following text from English to French: Hello, how are you?"
llm_output = chain(llm_input)
print("llm_output: ", llm_output)
llm_output:  {'product': 'Translate the following text from English to French: Hello, how are you?', 'text': '\n\nBonjour, comment allez-vous?'}
messages = [
    SystemMessage(content="You are a helpful assistant that translates English to French."),
    HumanMessage(content="Translate the following sentence: Hello, how are you?")
]
chat_output = chat(messages)
print("chat_output: ", chat_output)
chat_output:  content='Bonjour, comment ça va ?' additional_kwargs={} example=False

4 Conclusion

Both LLMs and chat models have benefits and drawbacks. LLMs are strong and adaptable, able to produce text for a variety of jobs. Their API is less organised than Chat Models’, though.

Conversely, Chat Models are better suited for conversational tasks and have a more organised API. Additionally, they have the ability to recall earlier talks with the user, which makes them better suited for having meaningful interactions. They also gain from reinforcement learning from human feedback, which helps them respond more effectively. They may need careful treatment to prevent hallucinations and the creation of inappropriate content because they still have certain limits in their reasoning.

5 Acknowledgements

I’d like to express my thanks to the wonderful LangChain & Vector Databases in Production Course by Activeloop - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Subscribe