Key Considerations when Creating Practical Applications using Large Language Models

Creating useful applications with AI & Large Language Models involves many aspects, here I highlight key considerations when building these applications & describe how I built & deployed 6 LLM applications with LangChain to summarise or chat with documents, web pages or youtube videos

Pranath Fernando


June 14, 2023

1 Introduction

AI and Large Language Models such as ChatGPT have brought some dramatic developments in recent years. But trying to actually build useful applications using these involves many challenges and considerations. However, the potential number of useful applications using LLM’s is huge.

For example, using the LLM application framework LangChain which I have been using for a little while now provides ready to go built in templates for common useful LLM usecases:

  • Autonomous Agents: Autonomous agents are long-running agents that take many steps in an attempt to accomplish an objective. Examples include AutoGPT and BabyAGI.

  • Agent Simulations: Putting agents in a sandbox and observing how they interact with each other and react to events can be an effective way to evaluate their long-range reasoning and planning abilities.

  • Personal Assistants: One of the primary LangChain use cases. Personal assistants need to take actions, remember interactions, and have knowledge about your data.

  • Question Answering: Another common LangChain use case. Answering questions over specific documents, only utilizing the information in those documents to construct an answer.

  • Chatbots: Language models love to chat, making this a very natural use of them.

  • Querying Tabular Data: Recommended reading if you want to use language models to query structured data (CSVs, SQL, dataframes, etc).

  • Code Understanding: Recommended reading if you want to use language models to analyze code.

  • Interacting with APIs: Enabling language models to interact with APIs is extremely powerful. It gives them access to up-to-date information and allows them to take actions.

  • Extraction: Extract structured information from text.

  • Summarization: Compressing longer documents. A type of Data-Augmented Generation.

  • Evaluation: Generative models are hard to evaluate with traditional metrics. One promising approach is to use language models themselves to do the evaluation.

In this article I describe 6 new AI applications I have built using LangChain and explore the details of the choices I made in order to deploy them.

The 6 LLM Applications I have recently built are:

  • Document Summarisation: Write a summary of any PDF document
  • Document Chat: Chat with a PDF document, ask any questions about its content
  • Web Page Summarisation: Write a summary of any web page
  • Web Page Chat: Chat with a web page, ask any questions about its content
  • YouTube Summarisation: Write a summary of any YouTube video
  • YouTube Chat: Chat with a YouTube video, ask any questions about its content

These are all live, and you can try out any of these applications in my projects section.

The code behind all of these apps can be found in this github repo.

I will now cover some of the key considerations that could be good to think about when building LLM applications.

2 Choosing a Large Language Model

Before we start using LLM’s we need to consider what applications they will be used for. There is a huge range of LLM’s beyond just ChatGPT, and they all have their own unique characteristics and differences than make them more or less suitable for particular tasks.

Two broad categories of LLM’s would be:

  • Paid for service LLMs: These include services like OpenAI’s ChatGPT and others where to build a product using it you would need to pay per usage and you cannot see or directly access the LLM
  • Open Source LLMs: The most well known of these would be HuggingFace where to build an application you can freely download these LLM’s or use them hosted totally free of charge

With this in mind though there are various key considerations that should guide our choice of LLM for an application or project.

2.1 Which LLM’s could be used ?

Not all LLM’s can be used for all applications. For example only certain LLM’s can be used to engage in conversational responses, or for text classification, as we can see from the model task types for HuggingFace. In another use case, we may find a great model that gives perfect responses, but is actually so big it can’t actually fit on any commercially available servers.

2.2 Which LLM’s give a ‘good enough’ response or better ?

Once we have narrowed down which LLM’s could be used, we need to consider which models give the minimum viable product response needed or better. While of course everyone would ideally like to use the model that gives the best responses, these models often have other disadvantages that could be important i.e. if it costs too much to use to make it viable. So its worth having a broad range of potential models one could consider from the minimum viable product response to the best, so we can consider various pros and cons beyond simply the model that gives the best outputs. However its also fair to say the generally bigger models seem to give better responses and smaller models.

2.3 Is response time or latency important ?

For some project use cases the LLM will only need to give a few responses, and where having a quick response without having to wait too long doesnt matter so much - for example doing a text summary perhaps. In other use cases having quicker responses might be more important - for example a customer service chatbot used by thousands of users simultaneously. In this use case it might be more important to get quicker responses to questions as well as being able to scale this and be able to deal with multiple requests for multiple people all at the same time and quickly. Bigger LLM’s generally tend to give slower responses than smaller ones.

2.4 How big is my budget ?

Do you have a significant budget or not much at all? this can also be quite an important consideration. While currently, its fair to say paid models like ChatGPT give some of the best responses, you do need to pay. While the cost for each query to the model is small, this can quickly rise if you’re model gets used a lot. Are you making an application not as a product but just to demonstrate general caperbilities? If its publically available but gets used a lot, potentially this could rack up significant cost. Perhaps a free open source model is the right choice here.

Are the responses of the paid ChatGPT only slightly better for a particular task than a free open source model e.g. for text summarisation? Maybe its worth choosing the free open source model for the cost savings that could then be better spent elsewhere.

2.5 How do I need to adapt the model ?

There are basically 4 ways we can use and customise a LLM for a specific task application.

  1. Use as is: Here we simply use the model as it is for our task, we don’t change how it responds to queries. Cost: Zero
  2. Prompt Engineering: Here we iteratively develop our queries for the LLM until we get the best responses possible for a given model. Cost: Minimal, as this involves more human effort to find the best queries/prompts to ask.
  3. Fine Tune: Here we do some limited further training to fine tune the model using a specific new dataset to improve the responses for a given use case. Cost: Depends, but can start to accumulate from hundreds to thousands of dollars, depending on the model size, dataset etc.
  4. Train from scratch: Here we train the LLM from scratch on a given dataset. Cost: not on option for only a few companies given it can cost millions of dollars to train a large model from scratch

So we can see that the cost can increase depending on which option we use, and of course each option will probably produce different degrees of quality of responses. Furthermore, certain models you might not even have some of these options. For example with paid for services like ChatGPT you can use option 1 & 2 only, wheras with open source models you could do options 1-3.

2.6 Am I dealing with sensitive data ?

Are you dealing with sensitive data such as personal data? This could have an impact on the LLM you choose. For example, using a service like ChatGPT means you are sending all your requests and data to an external party like OpenAI - do you really know what they are doing with this data? Does that matter to you? Using an open source model for example one provided by HuggingFace means you can download and run this model on your own servers, which means when you make requests to these models all the data stays secure and totally under your control not seen by any other parties.

2.7 Am I comfortable being dependant on an external service ?

Arguably, one of the best LLM’s available in terms of quality of outputs currently is still OpenAI’s ChatGPT, getting very good responses to your queries and automating significant tasks by using this can seem very seductive and persusaive from a business perspective. And yet, not only is this a paid service so if you build a product around this you are building in a cost to your product that scales with usage, you are also making your LLM application dependant on a third party service. What happens if OpenAI decides to withdraw or change the terms of its ChatGPT service? what happens it they start to increase the price of it?

This dependancy on an external service can create long term risks to your LLM application, so this is a trade off against other aspects worth serious considation when building an LLM based solution. Of course there might be various ways to choose the trade off beyond simply using a paid service like ChatGPT or simply using an open source model like HuggingFace. For example, you could use a hybrid approach: perhaps using open source LLM’s for tasks where there is less difference in quality of output to ChatGPT for example for sentiment analysis, and you could use ChatGPT for tasks where the quality of output is much better for ChatGPT for example a chatbot.

This way, you’re not entirely dependant on an external service for all your LLM applications, and your costs also do not scale so directly with LLM usage, and yet you can still get the best outputs where it matters most to you.

2.8 Am I ok maintaining the LLM ?

Using an open source LLM has many advantages, one disadvantage is you are more responsible for its maintenance and performance. You could of course use an open source model that is hosted on HuggingFace for free for example, but these are also used by many others, and the maintenance of these models and ther service hosted there is going to vary. Open source models hosted publically may break, or be overloaded with requests - they are free for everyone to use after all!

So for most serious business applications using open source models where you can have more reliable performance and control - a common approach is to download the LLM to your own servers, in which case you are then responsible for maintaining them, making sure they work etc. As they say, with great power and freedom comes great responsibility! Furthermore, when your responsible for maintaining your own models this way - its up to you to keep up to date with the latest developments.

Newer and better open source models are being developed at a rapid rate, choosing a good model for now is fine, but you probably want to be keeping upto date with the latest developments and potentially testing then adding new and better open source models for your use case. Thats quite a bit of work, and work you don’t really need to do if you are using an external service like ChatGPT - yes you pay for the service, but they do all the work to maintain the models, and update them with better models. So again this is an important trade off to consider.

2.9 So which LLM did I choose and why?

So for my 6 LLM applications I chose the HuggingFace declare-flan-alpaca-large-18378 LLM. My use case if you recall is to develop these 6 LLM applications:

  • Document Summarisation: Write a summary of any PDF document
  • Document Chat: Chat with a PDF document, ask any questions about its content
  • Web Page Summarisation: Write a summary of any web page
  • Web Page Chat: Chat with a web page, ask any questions about its content
  • YouTube Summarisation: Write a summary of any YouTube video
  • YouTube Chat: Chat with a YouTube video, ask any questions about its content

These are also not production applications, these are demonstration applications hosted on my data science blog - not likely to be used by thousands of users a day!

So these are the reasons I chose this model:

  • The hugging face open source models while not having the best quality responses currently (ChatGPT would have better responses) are ‘good enough’ to demonstrate the types of functionality possible with LLM’s which is the main objective of these applications
  • The response time/latency of the models is not very important, again its not a commercial service, and I am not expecting thousands of users! The latency of these open source models is good enough
  • Using an open source model from hugging face means, while its not the best model available it costs me absolutely nothing. This means not only can I continue to build more and more of these demonstration apps without worrying about cost, in the unlikely event some of my LLM apps start to get very popular, i’m not going to get hit by huge costs that I would have by using ChatGPT.
  • This open source model is good enough to use for all my current 6 applications, and has been relatively easy to adapt for each use case with very minimal prompt engineering
  • There are no issues with sensitive data
  • I am comfortable with being dependant on the open source model being hosted on hugging face, even though it means i’m dependant on their service which many others are using so the response time varies, the convenience compared to setting up my own server to host and maintain is very helpful
  • I like supporting the open source movement where I can for many moral, practical and safety reasons. While some have expressed concern about AI and LLM’s, some have also argued one of the best ways to help with AI safety is by using open source models which are by definition open to being tested, looked at, and scruitinised by anyone. This is different to closed source LLM’s such as by ironically named OpenAI who actually do not open up ChatGPT to independant scrutiny. OpenAI and ChatGPT is not open source. More widely open source code and models are generally considered by experts as much more reliable, better understood and safer because of this.

3 Choosing a LLM framework

So we’ve chosen our LLM, what next? Thats a great step - but we are far from done. An LLM can only do so much by itself. The real usefulness and power of using LLM’s comes with combining it with other elements. These often include:

  • Some kind of memory: This could be to remember the history of everything that has been said with a chatbot conversation so the LLM can use this context to help answer questions
  • A vector embedding database: While anyone can easily chat with ChatGPT, one of the things that starts to make LLM useful for business applications is the ability to give a context to the model so it can ask specific questions say about a specific document. Furthermore, LLM’s can only consider a limited size context, so for example not a big amount of text like a whole book. A vector embedding database is a way of solving both of these problems together, providing a way to help an LLM answer specific questions about some content that could be as big as you like, and do this by providing only the most relevant context to the model.
  • The ability to connect with third party services: This might be services like a search engine, and api like twitter or google drive as alternative sources of text data.
  • Prompt managment: Prompts (aka questions) are key to using LLM’s, but we then need to manage and automate these prompts to actually make useful applications

Thats just a list of a few things needed, there are often many other things needed to make an LLM useful. This would mean writing quite a considerable amount of supporting code around the LLM, to actually turn it into a useful business appliction as I showed in a previous article.

But what if there was a ready built framework that does most of this for you and makes it much easier? thats what an LLM framework does. This is also very new concept, so there are not many of these frameworks out there, but probably one of the most popular LLM frameworks out there is LangChain.

The key modules of LangChain are (from least to most complex):

  • Models: Supported model types and integrations.

  • Prompts: Prompt management, optimization, and serialization.

  • Memory: Memory refers to state that is persisted between calls of a chain/agent.

  • Indexes: Language models become much more powerful when combined with application-specific data - this module contains interfaces and integrations for loading, querying and updating external data.

  • Chains: Chains are structured sequences of calls (to an LLM or to a different utility).

  • Agents: An agent is a Chain in which an LLM, given a high-level directive and a set of tools, repeatedly decides an action, executes the action and observes the outcome until the high-level directive is complete.

  • Callbacks: Callbacks let you log and stream the intermediate steps of any chain, making it easy to observe, debug, and evaluate the internals of an application.

This earlier article introduces LangChain in more detail.

4 Choosing a Web application framework

So now we need to think about how users are going to interact with your LLM application - are you building a Whatsapp Chatbot for example? In my case, i just want to provide an easy way anyone can try out chatting or summarising a document, web page or youtube video. For this I want to make a simple web application.

Before I was a Data Scientist, I was a Web Developer for many years. So i’m no stranger to building custom websites from scratch of all kinds, and can happily write Html, Css, Javascript, Linux, PHP, MySQL and Java all day. But the thing is, I’m no longer a web developer, I’m a Data Scientist, so while I can do web development, its a considerable job in itself. I would much prefer to be spending my time learning and using the latest data science and AI techniques.

So what I need is a web application framework - something that will do most of the web dev work for me to give me a good basic web page, and something that lets me stay completely using my data science language of choice Python to also create my web application.

There are several python web application frameworks out there including Flask, Dash, Django and many others. However one relatively new python web application framework I’ve become aware of recently is Streamlit, and interestingly this is a popular choice of people who are also building LLM applcations.

Some of the reasons streamlit can be a good web application framework for LLM applications include:

  • Good support for LLM application modules such as LangChain, HuggingFace etc
  • Great feature support for LLM type web components such as chat interfaces and inputs and responses

This is why I chose to use streamlit, and i certainly found it a joy to use - to very easily and quickly create my LLM applications.

You can see my streamlit code for these in this github repo see my live LLM streamlit applications here.

5 Choosing a Hosting Solution

Finally we need to find a home for our LLM application, we need a hosting solution. If we were creating a high usage customer facing commercial LLM application, we would probably want to be considering a cloud hosting solution such as DataBricks or AWS.

In particular, AWS seems to have done considerable work writing articles and developing solutions for LLM application development use cases that also use LangChain, and would probably by my LLM application cloud hosting solution of choice.

However, my use case is just to create some demo LLM applications. Heroku has been previously a popular service for demo applications, which I have used myself in earlier projects a few years ago. However Heroku has become far less attractive for people wanting to create demo applications since they removed their free tier last year.

Here again, the shiny new kid on the block Streamlit comes to the rescue with a new service Streamlit Community Cloud which allows you to host unlimited web applications for free! Small caveats: the unlimited web applications need to be from files hosted in a public github repo, and the virtual machines used to host your applications have a reasonable but limited spec.

But this seems like a small price to pay for building unlimited free demo LLM applications. I found this service extremely easy to use, and plan to use this for hosting future LLM demo applications.

6 Key python modules and functions

I’d like to end by explaining some of the key python modules and functions I used to build these LLM applications. Some of these are very new state of the art python tools that provide powerful functionality to make building LLM applications quicker, easier, more robust and more powerful. The code for these applications can be found in this github repo.

6.1 Streamlit functions

Streamlit is as you recall our web application framework. The streamlit documentation show how easy it is to use for example showing you how you can effectively create a classic basic ‘hello world’ one page website with just one line of code which i’d say is pretty neat.

The set_page_config() function was particularly easy to use in allowing me to customise the default web page menu, as was file_uploader() in generating a web interface for uploading documents.

6.2 Langchain functions

Langchain is of course our LLM application framework, and the heart of this application.

A common use case for LLM applications involves using a specific context to answer questions like a document, as mine do. As mentioned earlier, LLM’s can only consider a limited size context. So for example not a big amount of text like a whole book. A vector embedding database is a way of solving both of these problems together, providing a way to help an LLM answer specific questions about some content that could be as big as you like, and do this by providing only the most relevant context to the model.

More details on embeddings and vector databases for LLMs can be found in this earlier article.

LangChain offers many different options for creating vector stores aka embedding databases. FAISS is a embedding vector store created by FaceBook that you can use within LangChain. I found it very good for helping me pull the relevant parts of the context (document, youtube video transcript or web page text) to send to the LLM with either a question or a request for a summary. I used the similarity_search() function from FAISS to actually pull the most relevant parts of the text to the query from the vector store.

For the queries to the language model to summarise text, I used load_summarize_chain() which has a nice default prompt for summarising text that works well with many LLMs. Given an LLM can’t actually take a whole large text in one go to summarise, this chain bascially takes one chunk of text at a time, and sumamrises that. Then once all the chunks of text are sumamrised, it then runs the chain again on those summaries, until you end up with your final summary.

For the queries to the language model to answer questions about some text I used load_qa_chain(). This function passes the user question to the LLM along with the reteived context from the FAISS vector store, and returns the LLM answer.

7 Conclusion

In this article we have looked at key considerations you should think about when building LLM applications. I described how I built & deployed 6 LLM applications with LangChain to summarise or chat with documents, web pages or youtube videos, as well as covering other options available when building your own LLM applications.