LivingDataLab - Fine-tuning a Sentiment Analysis Model with Hugging Face

1 Introduction

In previous articles we have seen how to use transformer models for a wide range of natural language tasks, including machine translation, summarization, and question answering. Transformers have become the standard model for NLP, similar to convolutional models in computer vision.

In practice, you’ll rarely train a transformer model from scratch. Transformers tend to be very large, so they take time, money, and lots of data to train fully. Instead, you’ll want to start with a pre-trained model and fine-tune it with a dataset if you need to for specific needs, which has become the norm in this new but thriving area of AI.

Hugging Face (🤗) is the best resource for pre-trained transformers. Their open-source libraries simplifies downloading and using transformer models like BERT, T5, and GPT-2. And you can use them alongside libraries such as FastAi, TensorFlow, PyTorch and Flax.

In this article we will look at how you can use a pre-trained sentiment analysis text model and fine tune it for a specific use case.

2 Hugging Face Setup

As part of fine-tuning our model we will save our model to the hugging face hub so we can use it for inference later.

We will now login to the hugging face hub using my account which will enable us to connect to the hub later.

from huggingface_hub import notebook_login

notebook_login()

Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful

3 Download and Prepare Dataset

GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems which is commonly used to evaluate many state of the art NLP models.

This includes 10 different datasets including the GLUE SST-2 Dataset which is The Stanford Sentiment Treebank which consists of sentences from movie reviews and human annotations of their sentiment. So each sentance has a (positive/negative) class.

For our sentiment analysis use case, we will say we want to create a model specifically good at predicting the sentiment of movie reviews. By using a pre-trained sentiment analysis model from hugging face, we can fine tune this model using the Glue SST-2 movie review dataset for our task much more quickly than creating a model from scratch.

Let’s download the Glue SST-2 dataset and have a look.

from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

raw_datasets = load_dataset("glue", "sst2")
raw_datasets["train"][0]

WARNING:datasets.builder:Found cached dataset glue (/root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)

{'sentence': 'hide new secretions from the parental units ',
 'label': 0,
 'idx': 0}

raw_datasets["train"][2]

{'sentence': 'that loves its characters and communicates something rather beautiful about human nature ',
 'label': 1,
 'idx': 2}

So we can see a couple of examples including a positive (1) and negative (0) sentiment sentance.

To prepare the data for training, we need to convert it into tokens. Given the pre-trained sentiment analysis model from hugging face is BERT based, we will use a tokeniser that converts into tokens correct for this model.

We will define a function that helps us efficiently map tokenisation over the dataset that enables it to be done in parralel and so much faster. We will also ensure all sentances are padded to a standard length i.e. the maximum sentenace length per batch known as Dynamic Padding which again helps improve speed and efficiency.

checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)


def tokenize_function(example):
    return tokenizer(example["sentence"], truncation=True)


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-708b3297c12abe0a.arrow
WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-fe83053e0ec8e624.arrow
WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-32c4b33e8c95e68f.arrow

We will use the same model checkpoint used to create our tokeniser to create our pre-trained sentiment analysis model.

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

We will now define a function to compute metrics during training appropriate for the Glue SST-2 task, but of course any metrics could be defined here.

import evaluate
import numpy as np

def compute_metrics(eval_preds):
    metric = evaluate.load("glue", "sst2")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

We can also convert the class labels to more human readable text for sentiment both when converting labels to numbers and vice-versa.

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

4 Fine-Tune Model

Now our dataset is ready, we can fine-tune our sentiment analysis model.

We can configure various training parameters, including the number of training epochs and in this case for speed we will train for 1 epoch, in practice for a real use case we would of course train for many more epochs.

from transformers import Trainer
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="sentiment-analysis-model",
    num_train_epochs=1,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
)

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2, id2label=id2label, label2id=label2id)

trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/content/sentiment-analysis-model is already a clone of https://huggingface.co/Pranath/sentiment-analysis-model. Make sure you pull the latest changes with `repo.git_pull()`.
WARNING:huggingface_hub.repository:/content/sentiment-analysis-model is already a clone of https://huggingface.co/Pranath/sentiment-analysis-model. Make sure you pull the latest changes with `repo.git_pull()`.

Let’s now train our model.

%time trainer.train()

/usr/local/lib/python3.9/dist-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(

[8419/8419 12:24, Epoch 1/1]

Epoch	Training Loss	Validation Loss	Accuracy
1	0.210900	0.321527	0.908257

CPU times: user 10min 55s, sys: 1min, total: 11min 56s
Wall time: 12min 24s

TrainOutput(global_step=8419, training_loss=0.27605093690813515, metrics={'train_runtime': 744.9435, 'train_samples_per_second': 90.408, 'train_steps_per_second': 11.302, 'total_flos': 1029664559600160.0, 'train_loss': 0.27605093690813515, 'epoch': 1.0})

So it takes around 10 mins to train the model for 1 epoch of the data, using a GPU on Google Collab where this was run.

As we want to use the model for inference later, we will now save this model to my hugging face personal hub account.

trainer.push_to_hub()

Several commits (2) will be pushed upstream.
WARNING:huggingface_hub.repository:Several commits (2) will be pushed upstream.
The progress bars may be unreliable.
WARNING:huggingface_hub.repository:The progress bars may be unreliable.
To https://huggingface.co/Pranath/sentiment-analysis-model
   46f8829..ec11b25  main -> main

WARNING:huggingface_hub.repository:To https://huggingface.co/Pranath/sentiment-analysis-model
   46f8829..ec11b25  main -> main

To https://huggingface.co/Pranath/sentiment-analysis-model
   ec11b25..edfe735  main -> main

WARNING:huggingface_hub.repository:To https://huggingface.co/Pranath/sentiment-analysis-model
   ec11b25..edfe735  main -> main

'https://huggingface.co/Pranath/sentiment-analysis-model/commit/ec11b25d11ffa2843a04bed233f070276c1f4c96'

5 Model Inference

Now we have fine-tuned our model and saved it to my hub account, its easy to use it to make predictions on text.

Using the Hugging Face pipeline module will download the model, and all the appropriate functionality that will allow us to give it some text and to get back a prediction of its sentiment.

from transformers import pipeline

text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."

classifier = pipeline("sentiment-analysis", model="pranath/sentiment-analysis-model")
classifier(text)

[{'label': 'POSITIVE', 'score': 0.9972186088562012}]

Fine-tuning a Sentiment Analysis Model with Hugging Face

Subscribe