Fastai Application Architectures

In this article we will look at how to build custom applications in the fastai library, by looking at how current fastai image model applications are actually built.

Pranath Fernando


June 12, 2021

1 Introduction

The fastai deep learning library (as of 2021) is a layered API that has 4 levels of abstraction.

  • Application layer
  • High level API
  • Mid level API
  • Low level API

In this article we will look at how to build custom applications in the fastai library, by looking at how current fastai image model applications are actually built.

2 Fastai Image Model Applications

2.1 cnn_learner

When using this application, the first parameter we need to give it is an architecture which will be used as the body of the network. Usually this will be a ResNet architecture we pre-trained weights that is automaticially downloaded for you.

Next the final layer of the pre-trained model is cut, in fact all layers after the final pooling layer is also cut as well. Within each model we have a dictionary of information that allows us to identify these different points within the layers called model_meta here for example for ResNet50.

{'cut': -2,
 'split': <function>,
 'stats': ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])}

Key parts of the network are:

  • Head - The part of the network specialised for a particular task i.e. with a CNN the part after the adaptive average pooling layer
  • Body - Everything else not the Head including the Stem
  • Stem - The first layers of the network

We we take all the layers before the cut point of -2, we get the body of the model that fastai will keep to use for transfer learning. Then we can add a new head.


With this function we can choose how many extra layers should be added at the end as well as how much dropout and pooling. Fastai by default adds 2 linear layers rather than just one, as fastai have found this helps transfer learning work more quickly and easily than just one extra layer.

2.2 unet_learner

This architecture is most often used for image segmentation tasks.

We start of building this in the same way as the cnn_learner, chopping off the old head. For image segmentation, we are going to have to add a very different type of head to end up with a model that actually generates an image for segmentation.

One way we could do this is to add layers that can increase the grid size in a CNN, for example duplicating each of the pixels to make an image twice as big - this is known as nearest neighbour interpolation. Another approach uses strides, in this case a stride of half, which is known as transposed convolution. However neither of these approaches works well in practice.

They key problem here is there is simply not enough information in these downsampled activations alone to be able to recreate something like the oroginal image quality needed for segmentation - its a big ask! And perhaps not realistic.

The solution to this problem here is our friend again skip connections however using them not accross one layer - but reaching these connections far accross to the opposite side of the architecture.

Here on the left half of the model is a CNN, and the transposed convolutional layers on the right, with the extra skip connections in gray. This helps the Unet do a much better job at generate the type of images we want for segmentation. One challenge with Unet’s is the exact architecture does in this case depend on the image size, however fastai has a DynamicUnet object that automatically generates the correct architecture based on the data and image sizes given.

2.3 A Siamese Network

Let’s now try to create a custom model. In an earlier article we looked at creating a Siamese network model. Let’s recap the details of that model.

Let’s now build a custom model for the Siamese task. We will use a pre-trained model, pass 2 images through it, concatinate the results, then send them to a custom head that will return 2 predictions.

In terms of overall architecture and models lets define it like this.

class SiameseModel(Module):
    def __init__(self, encoder, head):
        self.encoder,self.head = encoder,head
    def forward(self, x1, x2):
        ftrs =[self.encoder(x1), self.encoder(x2)], dim=1)
        return self.head(ftrs)

We can create a body/encoder by taking a pre-trained model and cutting it, we just need to specify where we want to cut. The cut position for a ResNet is -2.

encoder = create_body(resnet34, cut=-2)
Downloading: "" to /root/.cache/torch/hub/checkpoints/resnet34-333f7ec4.pth

We can then create a head. If we look at the encoder/body it will tell us the last layer has 512 features, so this head will take 2*512 - as we will have 2 images.

head = create_head(512*2, 2, ps=0.5)

We can now build our model from our constructed head and body.

model = SiameseModel(encoder, head)

Before we can use a Learner to train the model we need to define 2 more things. Firstly, a loss function. We might use here cross-entropy, but as our targets are boolean we need to convert them to integers or Pytorch will throw and error.

Secondly, we need to define a custom splitter that will tell the fastai library how to split the model into parameter groups, which will help train only the head of the model when we do transfer learning. Here we want 2 parameter groups one for the encoder/body and one for the head. So lets define a splitter as well.

def loss_func(out, targ):
    return nn.CrossEntropyLoss()(out, targ.long())

def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

We can now define a learner using our data, model, loss function, splitter and a metric. As we are defining a learner manually here, we also have to call freeze manually as well, to ensure only the last paramete group i.e. the head is trained.

learn = Learner(dls, model, loss_func=loss_func, 
                splitter=siamese_splitter, metrics=accuracy)

Let’s now train our model.

learn.fit_one_cycle(4, 3e-3)
epoch train_loss valid_loss accuracy time
0 0.523447 0.334643 0.861299 03:03
1 0.373501 0.231564 0.913396 03:02
2 0.299143 0.209658 0.920162 03:02
3 0.251663 0.188553 0.928281 03:03

This has trained only our head. Lets now unfreeze the whole model to make it all trainable, and use discriminative learning rates. This will give a lower learning rate for the body and a higher one for the head.

learn.fit_one_cycle(4, slice(1e-6,1e-4))
epoch train_loss valid_loss accuracy time
0 0.235140 0.188717 0.924222 04:15
1 0.233328 0.179823 0.932341 04:12
2 0.210744 0.172465 0.928958 04:12
3 0.224448 0.176144 0.930311 04:14

3 Points to consider with architectures

There are a few points to consider when training models in practice. if you are running out of memory or time - then training a smaller model could be a good approach. If you are not training long enough to actually overfit, then you are probably not taking advantage of the capacity of your model.

So one should first try to get to the point where your model is overfitting.

Often many people when faced with a model that overfits, start with the wrong thing first i.e. to use a smaller model, or more regularization. Using a smaller model should be one of the last steps one tries, as this reduces the capaity of your model to actually learn what is needed.

A better approach is to actually try to use more data, such as adding more labels to the data, or using data augmentation for example. Mixup can be useful for this. Only once you are using much more data and are still overfitting, one could consider more generalisable architectures - for example adding batch norm could help here.

After this if its still not working, one could use regularisation, such as adding dropout to the last layers, but also throughout the model. Only after these have failed one should consider using a smaller model.

4 Conclusion

In this article we have looked at how to build custom fastai application architectures, using image model examples.