Image Classification

Lets create a classification model that can differentiate cat and dog breeds. And, lets improve this model's performance using some techniques!

from fastai.vision.all import *
path = untar_data(URLs.PETS)

path.ls()

(#2) [Path('images'),Path('annotations')]

(path/"images").ls()

(#7393) [Path('images/miniature_pinscher_199.jpg'),Path('images/newfoundland_183.jpg'),Path('images/pomeranian_90.jpg'),Path('images/pomeranian_102.jpg'),Path('images/japanese_chin_74.jpg'),Path('images/yorkshire_terrier_45.jpg'),Path('images/chihuahua_34.jpg'),Path('images/american_pit_bull_terrier_150.jpg'),Path('images/wheaten_terrier_160.jpg'),Path('images/staffordshire_bull_terrier_91.jpg')...]

fname = (path/"images").ls()[0]
fname

Path('images/miniature_pinscher_199.jpg')

Using regex to obtain label

Regex is a very handy tool to learn to extract patterns from strings. See below:

re.findall(r'(.+)_\d+.jpg$', fname.name)

['miniature_pinscher']

Nice now we can pass these as the labels

pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42), #split randomly
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'), #getting labels using regex
                 item_tfms=Resize(460),
                 batch_tfms=aug_transforms(size=224, min_scale=0.75)) #augmentation on data

dls = pets.dataloaders(path/"images")

dls.show_batch(nrows = 1, ncols=3)

dls.show_batch(nrows=1, ncols = 3, unique = True)

Lets test the model with what we have currently

learn = cnn_learner(dls, resnet18, metrics=accuracy) #Notice we didn't choose a loss, FastAI picks one for us 
learn.fit_one_cycle(2)

learn.loss_func #fastAI chose CrossEntropyLoss as the loss func

FlattenedLoss of CrossEntropyLoss()

Cross-Entropy Loss

What is cross-entropy loss? This is the loss function we use when classifying multiple objects.

x,y = dls.one_batch()

y #values refer to vocab list

TensorCategory([18, 19, 20, 23, 24, 35, 27, 27, 32, 16,  0,  9, 21,  0,  2, 12, 26, 10, 24, 20, 32, 27, 18, 28,  8, 18, 23, 21, 30, 29,  9, 26, 25, 29, 14, 11, 34,  7, 36,  4,  9,  6, 23, 20, 17, 14, 25, 19,
        23,  8, 18,  2, 11,  7,  9, 19, 19, 31,  1, 29,  1, 31,  7, 33], device='cuda:0')

dls.vocab

['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']

preds,_ = learn.get_preds(dl=[(x,y)])
preds

TensorImage([[2.1285e-01, 4.7995e-02, 1.1221e-03,  ..., 1.4506e-03, 9.0126e-03, 2.4594e-02],
        [6.9040e-06, 4.3596e-03, 8.7159e-06,  ..., 3.3018e-05, 2.8775e-05, 2.5054e-05],
        [2.1535e-12, 2.2470e-11, 5.4650e-12,  ..., 2.8323e-10, 1.6942e-10, 4.7921e-11],
        ...,
        [1.0889e-07, 3.3818e-07, 1.8115e-05,  ..., 4.5040e-07, 3.6186e-06, 5.4762e-08],
        [2.5651e-04, 5.6865e-05, 4.7429e-02,  ..., 1.0788e-05, 2.0495e-04, 3.3897e-06],
        [1.6846e-05, 3.8460e-06, 1.8923e-06,  ..., 1.2548e-06, 2.0216e-06, 2.9412e-07]])

len(preds[0]),preds[0].sum() #37 pred and all add up to 1

(37, TensorImage(1.0000))

How did we manage to make all the pred add up to 1? Softmax!

Softmax

Below you will learn why softmax is better than sigmoid when dealing with multiple classes

plot_function(torch.sigmoid, min=-4,max=4)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastbook/__init__.py:73: UserWarning: Not providing a value for linspace's steps is deprecated and will throw a runtime error in a future release. This warning will appear only once per process. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/RangeFactories.cpp:23.)
  x = torch.linspace(min,max)

acts = torch.randn((6,2))*2 #Getting preds
acts

tensor([[ 0.6734,  0.2576],
        [ 0.4689,  0.4607],
        [-2.2457, -0.3727],
        [ 4.4164, -1.2760],
        [ 0.9233,  0.5347],
        [ 1.0698,  1.6187]])

acts.sigmoid() #using sigmoid to squish pred (Notice that although the values are between 0-1, they dont add up to 1)

tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])

Can fix by switching to softmax

sm_acts = torch.softmax(acts, dim=1)
sm_acts #notice that now they add up to 1

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

Sidebar: Unique indexing technique

targ = tensor([0,1,0,1,1,0])

sm_acts

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

idx = range(6)
sm_acts[idx, targ]

tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])

from IPython.display import HTML
df = pd.DataFrame(sm_acts, columns=["3","7"])
df['targ'] = targ
df['idx'] = idx
df['loss'] = sm_acts[range(6), targ]
t = df.style.hide_index()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))

-sm_acts[idx, targ]

tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

F.nll_loss(sm_acts, targ, reduction='none')

tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

End Sidebar

Taking the Log

Another component of the cross entropy loss.

plot_function(torch.log, min=0,max=4)

loss_func = nn.CrossEntropyLoss()

loss_func(acts, targ)

tensor(1.8045)

F.cross_entropy(acts, targ)

tensor(1.8045)

nn.CrossEntropyLoss(reduction='none')(acts, targ) #Shows individual loses before taking the mean

tensor([0.5067, 0.6973, 2.0160, 5.6958, 0.9062, 1.0048])

Model Interpretation

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

interp.most_confused(min_val=5)

[('american_pit_bull_terrier', 'staffordshire_bull_terrier', 9),
 ('Egyptian_Mau', 'Bengal', 6),
 ('beagle', 'basset_hound', 5),
 ('english_setter', 'english_cocker_spaniel', 5),
 ('staffordshire_bull_terrier', 'american_bulldog', 5),
 ('staffordshire_bull_terrier', 'american_pit_bull_terrier', 5)]

interp.plot_top_losses(5, nrows=1)

Improving Our Model

We can improve our model a variety of ways. For one, we can finetune the learning rate parameter. LR makes a very big difference in the model's performance. Below you will find a technique to find the optimal LR.

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1, base_lr=0.1)

#Current err is 40% - Very bad

The Learning Rate Finder

learn = cnn_learner(dls, resnet34, metrics=error_rate) 
lr_min,lr_steep = learn.--lr_find() #Finding best lr

print(f"Minimum/10: {lr_min:.2e}, steepest point: {lr_steep:.2e}")

Minimum/10: 1.00e-02, steepest point: 3.63e-03

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2, base_lr=3e-3) #pick something inbetween

Error rate dropped down to 7%

Unfreezing

We can tune our model better by manually unfreezing it to find a better LR for the later layers.

fine_tune()

Below is what the finetune function does

learn.fine_tune??

freeze() - Freezes the model first
fit_one_cycle(1) - Runs 1 epoch to tune the final layers (Model frozen remember)
base_lr /= 2 - Changes the lr
unfreeze() - Now all parameters can be stepped
self.fit_one_cycle - Now we fit the model on the number of given epochs (Input)

We will now implement fine_tune below

learn = cnn_learner(dls, resnet34, metrics=error_rate) #cnn_learner freezes model for us
learn.fit_one_cycle(3, 3e-3)

learn.unfreeze()

learn.lr_find()

SuggestedLRs(lr_min=7.585775847473997e-08, lr_steep=0.00013182566908653826)

learn.fit_one_cycle(3, lr_max=1e-5) #Update learning rate again and train some more

Didn't improve a lot but we can go even further! See below:

Discriminative Learning Rates

A technique of gradually increasing the LR for the later layers.

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit_one_cycle(3, 3e-3)
learn.unfreeze()
learn.fit_one_cycle(6, lr_max=slice(1e-6,1e-4)) #can do better using a slice

learn.recorder.plot_loss()

Freezing

We can even train our model on the final layers for more epochs, before training it on the entire architecture.

from fastai.callback.fp16 import *
learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16() #half as many bits (Half precision floating pts)
learn.fine_tune(6, freeze_epochs=3) #First 3 epochs train finals layer, next 6 train all parameters

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth

Conclusion

Overall I hope you learned how to create your own labels and finetune your model to perform better.

Questionnaire

Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?
This proccess is actually known as presizing. Here a large image is needed because data augmentation often leads to degradation of the image. Therefore, to minize this destruction of the image quality, this technique known as presizing is used.
If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book's website for suggestions.
What are the two ways in which data is most commonly provided, for most deep learning datasets?
Individual files with data (Ex: Images)
Tabular data (CSV data)
Look up the documentation for L and try using a few of the new methods that it adds.
L is a custom list class by FastAI. It is designed to be a replacement for list in python.
Look up the documentation for the Python pathlib module and try using a few methods of the Path class.
Give two examples of ways that image transformations can degrade the quality of the data.
Rotating an image by leaves corner regions of the new bounds with emptiness, which will not teach the model anything.
Many rotation and zooming operations will require interpolation, which leave a lower quality image.
What method does fastai provide to view the data in a DataLoaders?
DataLoader.show_batch()
What method does fastai provide to help you debug a DataBlock?
DataBlock.summary()
Should you hold off on training a model until you have thoroughly cleaned your data?
No. It is better to create a model first and then plot_top_losses to have the model help you clean the data.
What are the two pieces that are combined into cross-entropy loss in PyTorch? Softmax function and Negative Log Likelihood Loss
What are the two properties of activations that softmax ensures? Why is this important?
All values add up to 1 and amplifies small changes in the output activations. This overall makes the model more confident when classifying.
When might you want your activations to not have these two properties?
I guess when you have more than one label possible for a class.
Calculate the exp and softmax columns of <> yourself (i.e., in a spreadsheet, with a calculator, or in a notebook).</strong> </li>
Why can't we use torch.where to create a loss function for datasets where our label can have more than two categories?
torch.where can only select between two possibilities.

What is the value of log(-2)? Why?
Undefined. Log is the inverse of exp, where all values are pos.

What are two good rules of thumb for picking a learning rate from the learning rate finder?
Minimum/10
Subjective choice based on observation

What two steps does the fine_tune method do?
Freezes and trains the head for 1 epoch
Unfreezes and trains on the input epochs

In Jupyter Notebook, how do you get the source code for a method or function?
??

What are discriminative learning rates?
Trick of using different learning rates for different layers of the model. Here the early layers have a lower lr and the later layers have a higher lr.

How is a Python slice object interpreted when passed as a learning rate to fastai?
First val is the initial lr, final val is the final lr, and the layers inbetween have a lr thats equal distant in the range.

Why is early stopping a poor choice when using 1cycle training?
The training may not have time to reach lower learning rate values.

What is the difference between resnet50 and resnet101?
Number of layers

What does to_fp16 do?
Lowers the floating point precision numbers so you can speed up training.
</ol> </div> </div> </div>

Further Research

Find the paper by Leslie Smith that introduced the learning rate finder, and read it.

See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Look on the forums and the book's website to see what other students have achieved with this dataset, and how they did it.
Rather than doing this lesson, I decided to do the MNIST, for which I had an accuracy of 61%. After the LR improvement it jumped to 87%.

</div>

Further Research

Find the paper by Leslie Smith that introduced the learning rate finder, and read it.
See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Look on the forums and the book's website to see what other students have achieved with this dataset, and how they did it.
Rather than doing this lesson, I decided to do the MNIST, for which I had an accuracy of 61%. After the LR improvement it jumped to 87%.

epoch	train_loss	valid_loss	accuracy	time
0	1.532583	0.412520	0.868065	01:20
1	0.705810	0.353400	0.887010	01:07

3	7	targ	idx	loss
0.602469	0.397531	0	0	0.602469
0.502065	0.497935	1	1	0.497935
0.133188	0.866811	0	2	0.133188
0.996640	0.003360	1	3	0.003360
0.595949	0.404051	1	4	0.404051
0.366118	0.633882	0	5	0.366118

epoch	train_loss	valid_loss	error_rate	time
0	0.522819	0.429013	0.128552	02:12
1	0.318475	0.252019	0.075778	02:12

epoch	train_loss	valid_loss	error_rate	time
0	1.134641	0.368042	0.112991	01:40
1	0.526193	0.260705	0.083897	01:40
2	0.330762	0.236595	0.075101	01:40

epoch	train_loss	valid_loss	error_rate	time
0	0.276042	0.235768	0.077131	02:12
1	0.237535	0.232880	0.071042	02:20
2	0.216734	0.229787	0.071719	02:15

epoch	train_loss	valid_loss	error_rate	time
0	1.136157	0.331316	0.108254	01:44
1	0.526700	0.263616	0.078484	01:41
2	0.318562	0.241009	0.075778	01:41

epoch	train_loss	valid_loss	error_rate	time
0	0.239533	0.225810	0.073072	02:13
1	0.240597	0.223987	0.069689	02:13
2	0.203807	0.220246	0.064276	02:13
3	0.202546	0.215731	0.066306	02:13
4	0.188517	0.213268	0.066306	02:13
5	0.177560	0.210809	0.065629	02:13

epoch	train_loss	valid_loss	error_rate	time
0	1.426074	0.289685	0.094723	02:23
1	0.630846	0.336782	0.109608	02:23
2	0.421451	0.289481	0.083221	02:23

epoch	train_loss	valid_loss	error_rate	time
0	0.265935	0.293698	0.094046	03:13
1	0.291401	0.294177	0.081867	03:14
2	0.256584	0.271347	0.077131	03:14
3	0.164891	0.258964	0.073072	03:14
4	0.088564	0.210342	0.064953	03:14
5	0.051878	0.209877	0.063599	03:14