Lesson 5 - FastAI
from fastai.vision.all import *
path = untar_data(URLs.PETS)
path.ls()
(path/"images").ls()
fname = (path/"images").ls()[0]
fname
re.findall(r'(.+)_\d+.jpg$', fname.name)
Nice now we can pass these as the labels
pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42), #split randomly
get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'), #getting labels using regex
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75)) #augmentation on data
dls = pets.dataloaders(path/"images")
dls.show_batch(nrows = 1, ncols=3)
dls.show_batch(nrows=1, ncols = 3, unique = True)
learn = cnn_learner(dls, resnet18, metrics=accuracy) #Notice we didn't choose a loss, FastAI picks one for us
learn.fit_one_cycle(2)
learn.loss_func #fastAI chose CrossEntropyLoss as the loss func
x,y = dls.one_batch()
y #values refer to vocab list
dls.vocab
preds,_ = learn.get_preds(dl=[(x,y)])
preds
len(preds[0]),preds[0].sum() #37 pred and all add up to 1
How did we manage to make all the pred add up to 1? Softmax!
plot_function(torch.sigmoid, min=-4,max=4)
acts = torch.randn((6,2))*2 #Getting preds
acts
acts.sigmoid() #using sigmoid to squish pred (Notice that although the values are between 0-1, they dont add up to 1)
sm_acts = torch.softmax(acts, dim=1)
sm_acts #notice that now they add up to 1
targ = tensor([0,1,0,1,1,0])
sm_acts
idx = range(6)
sm_acts[idx, targ]
from IPython.display import HTML
df = pd.DataFrame(sm_acts, columns=["3","7"])
df['targ'] = targ
df['idx'] = idx
df['loss'] = sm_acts[range(6), targ]
t = df.style.hide_index()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))
-sm_acts[idx, targ]
F.nll_loss(sm_acts, targ, reduction='none')
plot_function(torch.log, min=0,max=4)
loss_func = nn.CrossEntropyLoss()
loss_func(acts, targ)
F.cross_entropy(acts, targ)
nn.CrossEntropyLoss(reduction='none')(acts, targ) #Shows individual loses before taking the mean
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp.most_confused(min_val=5)
interp.plot_top_losses(5, nrows=1)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1, base_lr=0.1)
#Current err is 40% - Very bad
learn = cnn_learner(dls, resnet34, metrics=error_rate)
lr_min,lr_steep = learn.--lr_find() #Finding best lr
print(f"Minimum/10: {lr_min:.2e}, steepest point: {lr_steep:.2e}")
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2, base_lr=3e-3) #pick something inbetween
Error rate dropped down to 7%
fine_tune()
Below is what the finetune function does
learn.fine_tune??
freeze() - Freezes the model first
fit_one_cycle(1) - Runs 1 epoch to tune the final layers (Model frozen remember)
base_lr /= 2 - Changes the lr
unfreeze() - Now all parameters can be stepped
self.fit_one_cycle - Now we fit the model on the number of given epochs (Input)
learn = cnn_learner(dls, resnet34, metrics=error_rate) #cnn_learner freezes model for us
learn.fit_one_cycle(3, 3e-3)
learn.unfreeze()
learn.lr_find()
learn.fit_one_cycle(3, lr_max=1e-5) #Update learning rate again and train some more
Didn't improve a lot but we can go even further! See below:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit_one_cycle(3, 3e-3)
learn.unfreeze()
learn.fit_one_cycle(6, lr_max=slice(1e-6,1e-4)) #can do better using a slice
learn.recorder.plot_loss()
from fastai.callback.fp16 import *
learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16() #half as many bits (Half precision floating pts)
learn.fine_tune(6, freeze_epochs=3) #First 3 epochs train finals layer, next 6 train all parameters
- Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU?
This proccess is actually known as presizing. Here a large image is needed because data augmentation often leads to degradation of the image. Therefore, to minize this destruction of the image quality, this technique known as presizing is used. - If you are not familiar with regular expressions, find a regular expression tutorial, and some problem sets, and complete them. Have a look on the book's website for suggestions.
- What are the two ways in which data is most commonly provided, for most deep learning datasets?
Individual files with data (Ex: Images)
Tabular data (CSV data) - Look up the documentation for
L
and try using a few of the new methods that it adds.
L is a custom list class by FastAI. It is designed to be a replacement for list in python. - Look up the documentation for the Python
pathlib
module and try using a few methods of thePath
class. - Give two examples of ways that image transformations can degrade the quality of the data.
Rotating an image by leaves corner regions of the new bounds with emptiness, which will not teach the model anything.
Many rotation and zooming operations will require interpolation, which leave a lower quality image. - What method does fastai provide to view the data in a
DataLoaders
?
DataLoader.show_batch() - What method does fastai provide to help you debug a
DataBlock
?
DataBlock.summary() - Should you hold off on training a model until you have thoroughly cleaned your data?
No. It is better to create a model first and then plot_top_losses to have the model help you clean the data. - What are the two pieces that are combined into cross-entropy loss in PyTorch? Softmax function and Negative Log Likelihood Loss
- What are the two properties of activations that softmax ensures? Why is this important?
All values add up to 1 and amplifies small changes in the output activations. This overall makes the model more confident when classifying. - When might you want your activations to not have these two properties?
I guess when you have more than one label possible for a class. - Calculate the
exp
andsoftmax
columns of <> yourself (i.e., in a spreadsheet, with a calculator, or in a notebook).</strong> </li> - Why can't we use
torch.where
to create a loss function for datasets where our label can have more than two categories?
torch.where can only select between two possibilities.- What is the value of log(-2)? Why?
Undefined. Log is the inverse of exp, where all values are pos.- What are two good rules of thumb for picking a learning rate from the learning rate finder?
Minimum/10
Subjective choice based on observation- What two steps does the
fine_tune
method do?
Freezes and trains the head for 1 epoch
Unfreezes and trains on the input epochs- In Jupyter Notebook, how do you get the source code for a method or function?
??- What are discriminative learning rates?
Trick of using different learning rates for different layers of the model. Here the early layers have a lower lr and the later layers have a higher lr.- How is a Python
slice
object interpreted when passed as a learning rate to fastai?
First val is the initial lr, final val is the final lr, and the layers inbetween have a lr thats equal distant in the range.- Why is early stopping a poor choice when using 1cycle training?
The training may not have time to reach lower learning rate values.- What is the difference between
resnet50
andresnet101
?
Number of layers- What does
</ol> </div> </div> </div>to_fp16
do?
Lowers the floating point precision numbers so you can speed up training.</div>- Find the paper by Leslie Smith that introduced the learning rate finder, and read it.
- See if you can improve the accuracy of the classifier in this chapter. What's the best accuracy you can achieve? Look on the forums and the book's website to see what other students have achieved with this dataset, and how they did it.
Rather than doing this lesson, I decided to do the MNIST, for which I had an accuracy of 61%. After the LR improvement it jumped to 87%.
- Why can't we use