LivingDataLab - An Introduction to LangChain Expression Language (LCEL)

1 Introduction

In the evolving world of large language models and AI, the need for more streamlined and efficient methods of chaining components and agents together has led to the development of LangChain, and more recently the LangChain Expression Language (LCEL). This new syntax simplifies the construction and management of these llm chains, enhancing transparency and usability, enabling the building of LLM application prototypes quicker and faster.

2 Understanding LCEL: A New Protocol for Chaining LangChain Components

LCEL presents a new approach to composing various components of LangChain. It has a protocol that specifies a set of permissible input and output types, along with a suite of standard methods. These methods are designed to be universally applicable across all runables, allowing for uniform invocation. Moreover, LCEL facilitates the dynamic adjustment of parameters during runtime and the incorporation of additional options, such as fallback mechanisms.

A large part of LangChain’s power derives from combining chains of diverse components. LCEL, a runable protocol that defines these things, is a new way of doing this. It begins by defining an acceptable set of input types and a corresponding allowed set of output types. There are then a number of standard methods, which we will go over in detail, but these are standard ways that all runables use, so you can refer to them all in the same way.

2.1 The Interface of Runables in LCEL

Every runable under LCEL is expected to adhere to a common interface, consisting of methods like ‘invoke’ for single inputs, ‘stream’ for streaming responses, and ‘batch’ for listed inputs. To cater to different operational needs, each of these synchronous methods is paired with an asynchronous counterpart, ensuring flexibility across various use cases.

2.2 Common Properties and Methods

Runables share common properties, including input and output schemas that define the types they handle. This standardization is crucial for consistency and interoperability among different components within LangChain.

3 The Advantages of Utilizing LCEL

LCEL offers several benefits to developers and users alike. It comes equipped with asynchronous operations, streaming, and batch processing capabilities, making the transition from development to production seamless. Its design inherently supports parallel processing, vital for time-intensive language model (LLM) calls. Built-in logging capabilities provide visibility into the complex sequences of steps within chains and agents, essential for robust application development.

3.1 Fallbacks, Parallelism, and Logging

The ability to attach fallbacks to both individual LLMs and entire chains ensures reliability in unpredictable scenarios. Parallel execution of LLM calls via LCEL maximizes efficiency. As chains and agents grow in complexity, the integrated logging system within LCEL offers a critical tool for monitoring and debugging.

4 Practical Application of LCEL

When it comes to practical usage, setting up an LCEL environment involves importing various components such as prompt templates, language models, and output parsers. These elements are then seamlessly chained together using LCEL’s syntax, with the ability to pass additional arguments and modify prompts on the fly.

Let’s first import our libraries.

import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

#!pip install pydantic==1.10.8

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
import warnings
warnings.filterwarnings('ignore')

/Users/pranathfernando/opt/anaconda3/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.8.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.
  warnings.warn(

5 Simple Chain

5.1 Setting Up and Executing Chains

A simple chain might link a prompt template to a language model and then to an output parser. The integration process is straightforward, connecting components with specified types and utilizing the ‘invoke’ method for invocation, with inputs tailored to the prompt template’s requirements. And we’ll make one in which we ask the language model to give us a brief joke. The language model that will be used will then be initialised. Finally, we’ll design the output parser that we’ll use.

All that remains is to connect them with the types and apps in order to form a chain. And we can now use the invoke method to call this chain. And the inputs here will be the prompt template’s inputs.

prompt = ChatPromptTemplate.from_template(
    "tell me a short joke about {topic}"
)
model = ChatOpenAI()
output_parser = StrOutputParser()

chain = prompt | model | output_parser

chain.invoke({"topic": "bears"})

"Why don't bears wear shoes? \n\nBecause they have bear feet!"

6 More complex chain

6.1 Advanced Chaining Techniques

Beyond basic chains, LCEL also supports more complex operations like retrieval-augmented generation. This involves setting up retrievers, creating prompts that incorporate context and user questions, and then constructing chains that process these elements in a streamlined fashion.

In a previous post we discussed retrieval augmented generation (RAG). So we’ll use LCEL to mimic that process.

First, we must configure our retrievers. We’ll start by making a very simple vector storage. This will be accomplished by initialising it with two types. We’ll then pass in the embedding model, open an embedding, and use vector store to generate a retriever. This includes retrievers.

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_texts(
    ["pranath worked at kensho", "bears like to eat honey"],
    embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

So we call it with “where did Pranath work?” And we get a list of documents back, with the first one visible. If we ask it something different, such as what do bears like to eat, we can see that the first answer is that bears like to eat honey. So this is a simplistic example because there are just two documents, but the main point is that you can do this on a bigger number of documents and it will return the most relevant document, which we will then utilise in our retriever order of creation pipeline.

retriever.get_relevant_documents("where did pranath work?")

[Document(page_content='pranath worked at kensho', metadata={}),
 Document(page_content='bears like to eat honey', metadata={})]

retriever.get_relevant_documents("what do bears like to eat")

[Document(page_content='bears like to eat honey', metadata={}),
 Document(page_content='pranath worked at kensho', metadata={})]

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

To build this pipeline, we’ll start with a prompt that basically asks the language model to answer the question based only on the subsequent context, and has two variables here, as context and question, so we’ll be packing in two. We’ll be packing in the question, which will be the user query, and then we’ll be packing in the context, which will be the documents that we recover from the retriever.

So, how do we plan on constructing this chain? The user question should be the first and only implicit chain. From there, we want to retrieve relevant context, which we will then pass into the prompt, which we will then pass into a model, which we will then give into the other person and transform the chat message into a string. So the first thing we’ll do is build something that takes a single question and converts it into a dictionary comprising two elements: context and question. To accomplish so, we’ll utilise a runnable map, which we can import like this.

from langchain.schema.runnable import RunnableMap

This runnable map will contain two elements: the first is context, and the context will call the retriever. We may create a lambda function that calls the retriever and accepts the query argument as a parameter. We also want to pass the original question parameter forward, which we can do with another simple lambda function. From here, we want to feed that into the prompt, then into the model, and finally into the output parser.

So we can just put all of those together in a pipe, call it our chain, and we have a new runnable. And now we bring it up again with the inquiry, “Where did Pranath work?”

chain = RunnableMap({
    "context": lambda x: retriever.get_relevant_documents(x["question"]),
    "question": lambda x: x["question"]
}) | prompt | model | output_parser

chain.invoke({"question": "where did pranath work?"})

'Pranath worked at Kensho.'

We’d like to take a closer look at what’s going on in the scene. We can simply look at this runnable map. So let’s make a list of variables, hoping, but, and then if we run this on the same thing, we can see that we receive back context, which is a listed document, and then the question, so there’s both pieces there.

inputs = RunnableMap({
    "context": lambda x: retriever.get_relevant_documents(x["question"]),
    "question": lambda x: x["question"]
})

inputs.invoke({"question": "where did pranath work?"})

{'context': [Document(page_content='pranath worked at kensho', metadata={}),
  Document(page_content='bears like to eat honey', metadata={})],
 'question': 'where did pranath work?'}

This is then sent into the prompt; we’ve just finished constructing a prompt value, so we’ll just send it into the model, which will return the chat message, pass the output person, and lastly return the string. Another thing we can do with runnables is tie useful functions to them.

functions = [
    {
      "name": "weather_search",
      "description": "Search for weather given an airport code",
      "parameters": {
        "type": "object",
        "properties": {
          "airport_code": {
            "type": "string",
            "description": "The airport code to get the weather for"
          },
        },
        "required": ["airport_code"]
      }
    }
  ]

prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}")
    ]
)
model = ChatOpenAI(temperature=0).bind(functions=functions)

runnable = prompt | model

runnable.invoke({"input": "what is the weather in sf"})

AIMessage(content='', additional_kwargs={'function_call': {'name': 'weather_search', 'arguments': '{\n  "airport_code": "SFO"\n}'}}, example=False)

So, given this collection of functions, we wish to invoke the language model using those functions. Normally, this would seem like a prompt followed by a model. But what about the functions? So, to insert the functions, we can perform bind, then functions = functions, and what this does under the hood is that when the model is called, it will send any parameters and bind along with the invitation. So, if we construct a runnable by executing prompt and then typing model, when we call the runnable, an input that should use the function, such as what is the weather, is passed.

We can see that what we get back is a method with this extra parsing argument and the function call that we expect to return. So lets include another function, sport search. It will look for news about recent sporting events and will enter a team name. We now have two distinct functions here. We may simply update the model by adding additional functions.

functions = [
    {
      "name": "weather_search",
      "description": "Search for weather given an airport code",
      "parameters": {
        "type": "object",
        "properties": {
          "airport_code": {
            "type": "string",
            "description": "The airport code to get the weather for"
          },
        },
        "required": ["airport_code"]
      }
    },
        {
      "name": "sports_search",
      "description": "Search for news of recent sport events",
      "parameters": {
        "type": "object",
        "properties": {
          "team_name": {
            "type": "string",
            "description": "The sports team to search for"
          },
        },
        "required": ["team_name"]
      }
    }
  ]

model = model.bind(functions=functions)

runnable = prompt | model

runnable.invoke({"input": "how did the patriots do yesterday?"})

AIMessage(content='', additional_kwargs={'function_call': {'name': 'sports_search', 'arguments': '{\n  "team_name": "patriots"\n}'}}, example=False)

We can see that it is now aware of the sport search.

7 Implementing Fallbacks and Parallelism

LCEL’s power is further demonstrated in its ability to handle fallbacks for entire sequences and facilitate parallel executions. For instance, creating a retriever-augmented generation pipeline involves various steps, from fetching relevant documents to parsing the final output, all of which can be configured to run in parallel, enhancing performance.

7.1 Debugging and Modifying Chains

Debugging is made easier with LCEL, as developers can inspect each step of the chain and the transformations it undergoes. Additionally, chains can be dynamically altered with new functions or parameters, showcasing the flexibility of the LCEL system.

So, when we’re attempting to get a language model to generate JSON, let’s take a look at that. We’ll utilise an older version of a language model from OpenAI that isn’t quite as good - to create a circumstance that will fail. To accomplish this, we will import an OpenAI language model. This is distinct from chat models. Chat models are newer forms of models, although they’re generally quite good at things like JSON output. Older models, on the other hand, aren’t. Now we’ll make a simple model and a simple chain.

from langchain.llms import OpenAI
import json

simple_model = OpenAI(
    temperature=0, 
    max_tokens=1000, 
    model="text-davinci-001"
)
simple_chain = simple_model | json.loads

In this simple model, we’ll set the temperature to zero to get some determinism. This is an extremely early version of an OpenAI model. Our chain will simply call this language model and then pipe the output into json.loads. This will fail if the outcome of the language model is not correct. We’ll ask it to write three poems in a JSON blob, with each containing a JSON blob with a title, author, and opening line. This is a difficult problem, and we want it to respond with a valid JSON.

challenge = "write three poems in a json blob, where each poem is a json blob of a title, author, and first line"

simple_model.invoke(challenge)

'\n\n["The Waste Land","T.S. Eliot","April is the cruelest month, breeding lilacs out of the dead land"]\n\n["The Raven","Edgar Allan Poe","Once upon a midnight dreary, while I pondered, weak and weary"]\n\n["Ode to a Nightingale","John Keats","Thou still unravish\'d bride of quietness, Thou foster-child of silence and slow time"]'

And so if we run the simple model on this, we can see that it responds with some structure format, and it’s kind of close. But if we take this and try to decode it with JSON, we can see that we get a JSON decode error.

simple_chain.invoke(challenge)

JSONDecodeError: Extra data: line 5 column 1 (char 103)

The latest OpenAI models are quite good at producing JSON. So we’ll make a model called ChatOpenAI, and then we’ll make a chain by piping that model with an output person, who will basically take the chat message from the model and convert it to a string, and then we’ll pipe that string into json.loads. We can see that this model successfully outputs validation when we apply the same challenge to it.

model = ChatOpenAI(temperature=0)
chain = model | StrOutputParser() | json.loads

chain.invoke(challenge)

{'poem1': {'title': 'Whispers of the Wind',
  'author': 'Emily Rivers',
  'first_line': 'Softly it comes, the whisper of the wind'},
 'poem2': {'title': 'Silent Serenade',
  'author': 'Jacob Moore',
  'first_line': 'In the stillness of night, a silent serenade'},
 'poem3': {'title': 'Dancing Shadows',
  'author': 'Sophia Anderson',
  'first_line': 'Shadows dance upon the moonlit floor'}}

We can see that when we call this chain on the challenge, it answers in the correct format. So it tried on the simple chain, failed, then went back to the good chain that sees in ChatOpenAI and succeeded, and that’s what we’re getting from this final chain.

final_chain = simple_chain.with_fallbacks([chain])

final_chain.invoke(challenge)

{'poem1': {'title': 'Whispers of the Wind',
  'author': 'Emily Rivers',
  'first_line': 'Softly it comes, the whisper of the wind'},
 'poem2': {'title': 'Silent Serenade',
  'author': 'Jacob Moore',
  'first_line': 'In the stillness of night, a silent serenade'},
 'poem3': {'title': 'Dancing Shadows',
  'author': 'Sophia Anderson',
  'first_line': 'Shadows dance upon the moonlit floor'}}

8 Interface

Finally, lets consider the interface of these runnables. Let’s go back to the previous chain and tell ourselves a joke. Let’s have a look at some of the interface’s different elements now. The one we were using from the start is invoke. This is a straightforward approach that relies on a single variable. There’s also that, which will call it based on a set of inputs. So we call chain.batch after passing in two different inputs, one with bears and one with frogs.

prompt = ChatPromptTemplate.from_template(
    "Tell me a short joke about {topic}"
)
model = ChatOpenAI()
output_parser = StrOutputParser()

chain = prompt | model | output_parser

chain.invoke({"topic": "bears"})

"Why don't bears wear shoes? \n\nBecause they have bear feet!"

chain.batch([{"topic": "bears"}, {"topic": "frogs"}])

["Why don't bears wear shoes? \n\nBecause they have bear feet!",
 'Why did the frog take the bus to work?\n\nBecause his car got toad away!']

And, under the hood, this is actually executing them in parallel to the greatest extent possible. We can also receive responses in real time. So chain.stream will return an iterable, and we can loop through it to print out the various elements. So, if we run this now, we can see the reaction to that streaming to the user.

for t in chain.stream({"topic": "bears"}):
    print(t)


Why
 don
't
 bears
 wear
 shoes
?


Because
 they
 already
 have
 bear
 feet
!

response = await chain.ainvoke({"topic": "bears"})
response

'Why don\'t bears use cell phones? \n\nBecause they can\'t find the "paws" button!'

9 Conclusion: The Future of Chaining with LCEL

LangChain Expression Language represents a significant advancement in the realm of language model chaining, offering a robust, flexible, and efficient framework. Its ability to handle asynchronous operations, fallbacks, parallel processing, and detailed logging paves the way for sophisticated LLM applications. With LCEL, developers are equipped to build, modify, and debug complex chains, ushering in a new era of AI-driven solutions.

Find out more about LCEL in this article and this documentation.

10 Acknowledgements

I’d like to express my thanks to the wonderful Functions, Tools and Agents with LangChain by DeepLearning.ai - which i completed, and acknowledge the use of some images and other materials from the course in this article.

An Introduction to LangChain Expression Language (LCEL)

Subscribe