Large Language Models for Text Transformation

In this article we will explore how to use Large Language Models for text transformation tasks such as language translation, spelling and grammar checking, tone adjustment, and format conversion.
natural-language-processing
deep-learning
openai
Author

Pranath Fernando

Published

May 5, 2023

1 Introduction

Large language models such as ChatGPT can generate text responses based on a given prompt or input. Writing prompts allow users to guide the language model’s output by providing a specific context or topic for the response. This feature has many practical applications, such as generating creative writing prompts, assisting in content creation, and even aiding in customer service chatbots.

For example, a writing prompt such as “Write a short story about a time traveler who goes back to the medieval period” could lead the language model to generate a variety of unique and creative responses. Additionally, prompts can be used to generate more specific and relevant responses for tasks such as language translation or summarization. In these cases, the prompt would provide information about the desired output, such as the language to be translated or the key points to be included in the summary. Overall, prompts provide a way to harness the power of large language models for a wide range of practical applications.

However, creating effective prompts for large language models remains a significant challenge, as even prompts that seem similar can produce vastly different outputs.

In my previous article, we looked at how to infer sentiment and topics from product reviews and news articles.

In this article, we will look at how to use Large Language Models for text transformation tasks such as language translation, spelling and grammar checking, tone adjustment, and format conversion.

2 Setup

2.1 Load the API key and relevant Python libaries.

First we need to load certain python libs and connect the OpenAi api.

The OpenAi api library needs to be configured with an account’s secret key, which is available on the website.

You can either set it as the OPENAI_API_KEY environment variable before using the library: !export OPENAI_API_KEY='sk-...'

Or, set openai.api_key to its value:

import openai
openai.api_key = "sk-..."
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.getenv('OPENAI_API_KEY')

2.2 Helper function

We will use OpenAI’s gpt-3.5-turbo model and the chat completions endpoint.

This helper function will make it easier to use prompts and look at the generated outputs:

We’ll simply define this helper function to make it easier to use prompts and examine outputs that are generated. GetCompletion is a function that just accepts a prompt and returns the completion for that prompt.

def get_completion(prompt, model="gpt-3.5-turbo", temperature=0): 
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
    )
    return response.choices[0].message["content"]

3 Text Transformation

Large language models are very good at transforming their input into a different format, such as taking a piece of text input in one language and transforming it or translating it to a different language, or helping with spelling and grammar corrections, so taking as input a piece of text that may not be fully grammatical and helping you to fix that up, or even transforming formats such as taking as input HTML and outputting JSON.

4 Translation

Large language models are trained on a lot of text from sort of many sources, a lot of which is the internet, and this is kind of, obviously, in a lot of different languages. Therefore, this form of endows the model with the capacity for translation.

These models also speak a variety of languages at varied levels of skill. We will go over some instances of how to use this functionality. So let’s get started with something easy. The prompt in this first example is to translate the following text to Spanish.

prompt = f"""
Translate the following English text to Spanish: \ 
```Hi, I would like to order a blender```
"""
response = get_completion(prompt)
print(response)
Output

Hola, me gustaría ordenar una licuadora.

So, in this case, the question is, “Tell me what language this is.” Then this is in French.

prompt = f"""
Tell me which language this is: 
```Combien coûte le lampadaire?```
"""
response = get_completion(prompt)
print(response)
Output

This is French.

Multiple translations can be performed simultaneously by the model. Let’s imagine, for the purposes of this example, that the following text is translated into Spanish. Let’s include one more English pirate.

prompt = f"""
Translate the following  text to French and Spanish
and English pirate: \
```I want to order a basketball```
"""
response = get_completion(prompt)
print(response)
Output

French pirate: Je veux commander un ballon de basket Spanish pirate: Quiero pedir una pelota de baloncesto English pirate: I want to order a basketball

So, depending on the speaker’s status in respect to the audience, the translation may vary in some languages. To the language model, you can also explain this. It will thus be able to translate in a somewhat appropriate manner. Translation of the following material into Spanish, then, in both official and informal forms, is what we’ll do in this example.

prompt = f"""
Translate the following text to Spanish in both the \
formal and informal forms: 
'Would you like to order a pillow?'
"""
response = get_completion(prompt)
print(response)
Output

Formal: ¿Le gustaría ordenar una almohada? Informal: ¿Te gustaría ordenar una almohada?

5 Universal Translator

For the next example, we’ll pretend that we’re in charge of a global e-commerce company. User communications will be sent to us in a wide range of languages as users report their IT problems. So, we require a universal translator. We’ll just paste a list of user messages in a variety of languages, and then we’ll loop through each one of them. So, the first thing we’ll do is ask the model to identify the language in which the problem is present. So, this is the prompt.

user_messages = [
  "La performance du système est plus lente que d'habitude.",  # System performance is slower than normal         
  "Mi monitor tiene píxeles que no se iluminan.",              # My monitor has pixels that are not lighting
  "Il mio mouse non funziona",                                 # My mouse is not working
  "Mój klawisz Ctrl jest zepsuty",                             # My keyboard has a broken control key
  "我的屏幕在闪烁"                                               # My screen is flashing
] 
for issue in user_messages:
    prompt = f"Tell me what language this is: ```{issue}```"
    lang = get_completion(prompt)
    print(f"Original message ({lang}): {issue}")

    prompt = f"""
    Translate the following  text to English \
    and Korean: ```{issue}```
    """
    response = get_completion(prompt)
    print(response, "\n")
Output

Original message (This is French.): La performance du système est plus lente que d’habitude. English: The system performance is slower than usual. Korean: 시스템 성능이 평소보다 느립니다.

Original message (This is Spanish.): Mi monitor tiene píxeles que no se iluminan. English: My monitor has pixels that don’t light up. Korean: 내 모니터에는 불이 켜지지 않는 픽셀이 있습니다.

Original message (This is Italian.): Il mio mouse non funziona English: My mouse is not working. Korean: 내 마우스가 작동하지 않습니다.

Original message (This is Polish.): Mój klawisz Ctrl jest zepsuty English: My Ctrl key is broken. Korean: 제 Ctrl 키가 고장 났어요.

Original message (This is Chinese (Simplified).): 我的屏幕在闪烁 English: My screen is flickering. Korean: 내 화면이 깜빡입니다.

If you wanted to keep this prompt to just one word, you might try modifying it to read something like “Tell me what language this is,” “Respond with only one word,” or “Don’t use a sentence.” Or you could request it in a JSON format, for example, which would probably encourage it to avoid using a complete sentence. So, you have just created a universal translator.

6 Tone Transformation

The style of writing can vary depending on the audience; for example, the way I would write an email to a colleague or professor will be very different from the way I text my younger brother. So, ChatGPT can assist in creating various tones. So let’s examine a few examples.

prompt = f"""
Translate the following from slang to a business letter: 
'Dude, This is Joe, check out this spec on this standing lamp.'
"""
response = get_completion(prompt)
print(response)
Output

Dear Sir/Madam,

I am writing to bring to your attention a standing lamp that I believe may be of interest to you. Please find attached the specifications for your review.

Thank you for your time and consideration.

Sincerely,

Joe

7 Format Conversion

ChatGPT does a fantastic job of converting data between numerous forms, including JSON to HTML, XML, and many others. Markdown. The input and output formats will be defined in the prompt.

data_json = { "resturant employees" :[ 
    {"name":"Shyam", "email":"shyamjaiswal@gmail.com"},
    {"name":"Bob", "email":"bob32@gmail.com"},
    {"name":"Jai", "email":"jai87@gmail.com"}
]}

prompt = f"""
Translate the following python dictionary from JSON to an HTML \
table with column headers and title: {data_json}
"""
response = get_completion(prompt)
print(response)
Output
<table>
  <caption>Restaurant Employees</caption>
  <thead>
    <tr>
      <th>Name</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Shyam</td>
      <td>shyamjaiswal@gmail.com</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>bob32@gmail.com</td>
    </tr>
    <tr>
      <td>Jai</td>
      <td>jai87@gmail.com</td>
    </tr>
  </tbody>
</table>

8 Spellcheck/Grammar check.

Grammar and spell checking will be the next things we examine. Here are some instances of typical grammar and spelling errors and how the language model can be used to correct them. I will generate a list of sentences that include grammatical or typographical problems.

Then, we’ll loop through each of these statements and ask the model to edit these.

Some of the methods we’ve talked about in the past could also be applied. So, we could suggest editing and proofreading the content below to make the prompt better. And then revise the entire thing, then rewrite it. Finally we simply state “no errors found” if you don’t find any errors.

To signal to the LLM that you want it to proofread your text, you instruct the model to ‘proofread’ or ‘proofread and correct’.

text = [ 
  "The girl with the black and white puppies have a ball.",  # The girl has a ball.
  "Yolanda has her notebook.", # ok
  "Its going to be a long day. Does the car need it’s oil changed?",  # Homonyms
  "Their goes my freedom. There going to bring they’re suitcases.",  # Homonyms
  "Your going to need you’re notebook.",  # Homonyms
  "That medicine effects my ability to sleep. Have you heard of the butterfly affect?", # Homonyms
  "This phrase is to cherck chatGPT for speling abilitty"  # spelling
]
for t in text:
    prompt = f"""Proofread and correct the following text
    and rewrite the corrected version. If you don't find
    and errors, just say "No errors found". Don't use 
    any punctuation around the text:
    ```{t}```"""
    response = get_completion(prompt)
    print(response)
Output

The girl with the black and white puppies has a ball. No errors found. It’s going to be a long day. Does the car need its oil changed? Their goes my freedom. There going to bring they’re suitcases.

Corrected version: There goes my freedom. They’re going to bring their suitcases. You’re going to need your notebook. That medicine affects my ability to sleep. Have you heard of the butterfly effect? This phrase is to check ChatGPT for spelling ability.

text = f"""
Got this for my daughter for her birthday cuz she keeps taking \
mine from my room.  Yes, adults also like pandas too.  She takes \
it everywhere with her, and it's super soft and cute.  One of the \
ears is a bit lower than the other, and I don't think that was \
designed to be asymmetrical. It's a bit small for what I paid for it \
though. I think there might be other options that are bigger for \
the same price.  It arrived a day earlier than expected, so I got \
to play with it myself before I gave it to my daughter.
"""
prompt = f"proofread and correct this review: ```{text}```"
response = get_completion(prompt)
print(response)
Output

I got this for my daughter’s birthday because she keeps taking mine from my room. Yes, adults also like pandas too. She takes it everywhere with her, and it’s super soft and cute. However, one of the ears is a bit lower than the other, and I don’t think that was designed to be asymmetrical.

Additionally, it’s a bit small for what I paid for it. I think there might be other options that are bigger for the same price. On the positive side, it arrived a day earlier than expected, so I got to play with it myself before I gave it to my daughter.

Another thing we can do is determine what kinds of disparities there are between the results of the model and our initial review. RedLines is a Python library that will be used for this. Additionally, we’ll obtain the discrepancy between the model output and the original text of our evaluation, then present it.

This allows you to compare the differences between the model output and the initial review as well as the types of errors that have been fixed. Because of this, the exercise we did was simply proofread and edit this review. However, you can also make more significant modifications, such as ones that affect the tone or other factors.

from redlines import Redlines

diff = Redlines(text,response)
display(Markdown(diff.output_markdown))
Output

So for this prompt, we’re going to ask the model to proofread and fix the same review while also making it more interesting, making sure it adheres to APA format, and making sure it’s written for an advanced reader. Additionally, we’ll want the output as markdown. The text from the original review is therefore being used again here.

prompt = f"""
proofread and correct this review. Make it more compelling. 
Ensure it follows APA style guide and targets an advanced reader. 
Output in markdown format.
Text: ```{text}```
"""
response = get_completion(prompt)
display(Markdown(response))
Output

Title: A Soft and Cute Panda Plush Toy for All Ages

Introduction: As a parent, finding the perfect gift for your child’s birthday can be a daunting task. However, I stumbled upon a soft and cute panda plush toy that not only made my daughter happy but also brought joy to me as an adult. In this review, I will share my experience with this product and provide an honest assessment of its features.

Product Description: The panda plush toy is made of high-quality materials that make it super soft and cuddly. Its cute design is perfect for children and adults alike, making it a versatile gift option. The toy is small enough to carry around, making it an ideal companion for your child on their adventures.

Pros: The panda plush toy is incredibly soft and cute, making it an excellent gift for children and adults. Its small size makes it easy to carry around, and its design is perfect for snuggling. The toy arrived a day earlier than expected, which was a pleasant surprise.

Cons: One of the ears is a bit lower than the other, which makes the toy asymmetrical. Additionally, the toy is a bit small for its price, and there might be other options that are bigger for the same price.

Conclusion: Overall, the panda plush toy is an excellent gift option for children and adults who love cute and cuddly toys. Despite its small size and asymmetrical design, the toy’s softness and cuteness make up for its shortcomings. I highly recommend this product to anyone looking for a versatile and adorable gift option.

9 Acknowledgements

I’d like to express my thanks to the wonderful ChatGPT Prompt Engineering for Developers Course by DeepLearning.ai and OpenAI - which i completed, and acknowledge the use of some images and other materials from the course in this article.

Subscribe