Multi-Label Classification

So far we have been doing classfication for various images, but often pictures have more than one object present within them. For this reason, we will now cover multi-label classification. Our goal is to create a model that can classify all objects within an image.

The Data

path = untar_data(URLs.PASCAL_2007)

Path.BASE_PATH = path
path.ls()

(#8) [Path('train'),Path('test.json'),Path('segmentation'),Path('train.json'),Path('valid.json'),Path('test.csv'),Path('train.csv'),Path('test')]

df = pd.read_csv(path/'train.csv')
df.head()

Sidebar: Pandas (pd) and DataFrames

df.iloc[:,0] #column 0

0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
           ...    
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object

df.iloc[0,:] #row 0
# Trailing :s are always optional (in numpy, pytorch, pandas, etc.),
#   so this is equivalent:
df.iloc[0]

fname       000005.jpg
labels           chair
is_valid          True
Name: 0, dtype: object

df['fname']

0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
           ...    
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object

tmp_df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
tmp_df

tmp_df['c'] = tmp_df['a']+tmp_df['b']
tmp_df

Exploring Dataloader and Datasets

a = list(enumerate(string.ascii_lowercase))
a[0], len(a)

((0, 'a'), 26)

dl_a = DataLoader(a, batch_size=8, shuffle=True)
first(dl_a)

(tensor([17, 18, 10, 22,  8, 14, 20, 15]),
 ('r', 's', 'k', 'w', 'i', 'o', 'u', 'p'))

a = list((string.ascii_lowercase))
dss = Datasets(a)
dss[0]

('a',)

End Sidebar

Constructing a DataBlock

You may have noticed, we often use a datablock when processing data. Why don't we create our own datablock from scratch?

dblock = DataBlock()

dsets = dblock.datasets(df) #This creates a train and valid set

len(dsets.train),len(dsets.valid)

(4009, 1002)

x,y = dsets.train[0]
x,y

(fname       004719.jpg
 labels         bus car
 is_valid          True
 Name: 2369, dtype: object,
 fname       004719.jpg
 labels         bus car
 is_valid          True
 Name: 2369, dtype: object)

Notice that the x and y are identical! This means we need to create our own independent and dependent var

Must create our own independent and dependent var

x['fname'] #This is our independent var

'004719.jpg'

y['labels'] #This is our dependent var

'bus car'

dblock = DataBlock(get_x = lambda r: r['fname'], 
                   get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]

#Awesome it works, just one more thing, we need the path

('004849.jpg', 'train')

def get_x(r): return r['fname']
def get_y(r): return r['labels']

dblock = DataBlock(
    get_x = get_x, 
    get_y = get_y)

dsets = dblock.datasets(df)
dsets.train[0]

('004069.jpg', 'bird')

This is good but we need the path and need to index the labels

def get_x(r): return path/'train'/r['fname'] #Need path
def get_y(r): return r['labels'].split(' ') #Split into each index

dblock = DataBlock(
    get_x = get_x, 
    get_y = get_y)

dsets = dblock.datasets(df)
dsets.train[0]

#looks good

(Path('train/003973.jpg'), ['bus', 'car', 'person'])

Looks good

Lets add the respective blocks (ImageBlock, MultiCategoryBlock)

To actually open the image and do the conversion to tensors, we will need to use blocks. Our independent block is obviously the images so ImageBlock. Our dependent block must be vary of more than one category so MultiCategoryBlock.

def get_x(r): return path/'train'/r['fname'] #Need path
def get_y(r): return r['labels'].split(' ') #Split into each index

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock), #Adding blocks
                   get_x = get_x, 
                   get_y = get_y)

dsets = dblock.datasets(df)
x,y = dsets.train[0]

Viewing data

x

y

TensorMultiCategory([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

dsets.vocab

['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

dsets.train[0][1] # Also known as y (Stored above)

TensorMultiCategory([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

idxs = torch.where(dsets.train[0][1]==1)[0]
dsets.vocab[idxs]

(#1) ['bird']

Now lets split the data into training and valid

def splitter(df):
    train = df.index[~df['is_valid']].tolist() # ~ means NOT
    valid = df.index[df['is_valid']].tolist()
    return train,valid

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y)

dsets = dblock.datasets(df)
dsets.train[0]

(PILImage mode=RGB size=500x333,
 TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))

Finally, lets switch the dataset out for a dataloader

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y,
                   item_tfms = RandomResizedCrop(128, min_scale=0.35)) #Must make all images same size

dls = dblock.dataloaders(df) #Switching to dataloaders

dls.show_batch(nrows=1, ncols=3)

Loss function for multi-label (Binary Cross-Entropy)

The loss function, as you may have expected, has changed once again. Because we are dealing with multiple labels, we need to use binary cross entropy.

learn = cnn_learner(dls, resnet18)

x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x) #Can view activations on a single batch
activs.shape #64 imgs with 20 actv each

torch.Size([64, 20])

activs[0]

TensorImage([ 2.1040,  3.5137, -1.8048,  4.0140, -2.7668,  5.4573,  0.4210, -2.2953,  2.8784, -0.6523,  1.9214, -2.0250, -0.2120, -1.9046, -1.1947, -0.1867, -0.9368,  0.4383, -1.1167, -0.8710],
       grad_fn=<AliasBackward>)

def binary_cross_entropy(inputs, targets):
    inputs = inputs.sigmoid()
    return -torch.where(targets==1, inputs, 1-inputs).log().mean()

loss_func = nn.BCEWithLogitsLoss() #Same thing as the func above
loss = loss_func(activs, y)
loss

TensorImage(1.0545, grad_fn=<AliasBackward>)

Accuracy

One change compared to the last chapter is the metric we use: because this is a multilabel problem, we can't use the accuracy function. Why is that? Well, accuracy was comparing our outputs to our targets like so:

def accuracy(inp, targ, axis=-1):
    "Compute accuracy with `targ` when `pred` is bs * n_classes"
    pred = inp.argmax(dim=axis)
    return (pred == targ).float().mean()

The class predicted was the one with the highest activation (this is what argmax does). Here it doesn't work because we could have more than one prediction on a single image. After applying the sigmoid to our activations (to make them between 0 and 1), we need to decide which ones are 0s and which ones are 1s by picking a threshold. Each value above the threshold will be considered as a 1, and each value lower than the threshold will be considered a 0:

def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    "Compute accuracy when `inp` and `targ` are the same size."
    if sigmoid: inp = inp.sigmoid()
    return ((inp>thresh)==targ.bool()).float().mean()

Showcasing how partial works in python

def say_hello(name, say_what="Hello"): return f"{say_what} {name}."

say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')

('Hello Jeremy.', 'Ahoy! Jeremy.')

f = partial(say_hello, say_what="Bonjour")

f("Jeremy"),f("Sylvain")

('Bonjour Jeremy.', 'Bonjour Sylvain.')

End Sidebar

Training

learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)

95% acc with a treshold of .2, lets try some more

learn.metrics = partial(accuracy_multi, thresh=0.1) #Try different thresh
learn.validate()

(#2) [0.1038823127746582,0.9292030334472656]

learn.metrics = partial(accuracy_multi, thresh=0.99)
learn.validate()

(#2) [0.1038823127746582,0.9444024562835693]

preds,targs = learn.get_preds()

accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)

TensorImage(0.9584)

xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);

Next model - Regression

Now that we have taken a look at multi label model, lets take a look at a regression model!

Assemble the Data

path = untar_data(URLs.BIWI_HEAD_POSE)

path.ls().sorted()

(#50) [Path('01'),Path('01.obj'),Path('02'),Path('02.obj'),Path('03'),Path('03.obj'),Path('04'),Path('04.obj'),Path('05'),Path('05.obj')...]

(path/'01').ls().sorted()

(#1000) [Path('01/depth.cal'),Path('01/frame_00003_pose.txt'),Path('01/frame_00003_rgb.jpg'),Path('01/frame_00004_pose.txt'),Path('01/frame_00004_rgb.jpg'),Path('01/frame_00005_pose.txt'),Path('01/frame_00005_rgb.jpg'),Path('01/frame_00006_pose.txt'),Path('01/frame_00006_rgb.jpg'),Path('01/frame_00007_pose.txt')...]

img_files = get_image_files(path) #jpg's

im = PILImage.create(img_files[0])
im.shape

(480, 640)

im.to_thumb(160)

def img2pose(x): 
    return Path(f'{str(x)[:-7]}pose.txt') 
img2pose(img_files[0])

Path('06/frame_00554_pose.txt')

cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
#Given func to get center of face
def get_ctr(f):
    ctr = np.genfromtxt(img2pose(f), skip_header=3)
    c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
    c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
    return tensor([c1,c2])

get_ctr(img_files[0])

tensor([330.9554, 308.3703])

Required datablock

biwi = DataBlock(
    blocks=(ImageBlock, PointBlock),
    get_items=get_image_files,
    get_y=get_ctr,
    splitter=FuncSplitter(lambda o: o.parent.name=='13'), #Just grabbing person number 13 as our valid set
    batch_tfms=[*aug_transforms(size=(240,320)), 
                Normalize.from_stats(*imagenet_stats)]
)

dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))

xb,yb = dls.one_batch()
xb.shape,yb.shape

(torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))

yb[0]

TensorPoint([[0.1999, 0.0785]], device='cuda:0')

Training a Model

learn = cnn_learner(dls, resnet18, y_range=(-1,1)) #range states what values are within the acceptable range 
                                                   # -1 is far-left and bottom, 1 is far-right and top

def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo

plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastbook/__init__.py:73: UserWarning: Not providing a value for linspace's steps is deprecated and will throw a runtime error in a future release. This warning will appear only once per process. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/RangeFactories.cpp:23.)
  x = torch.linspace(min,max)

dls.loss_func

FlattenedLoss of MSELoss()

learn.lr_find()

SuggestedLRs(lr_min=0.005754399299621582, lr_steep=1.5848931980144698e-06)

lr = 1e-2
learn.fine_tune(3, lr)

math.sqrt(0.0001) #error

0.01

learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))

Conclusion

Today you learned how to create both a multi-label and regression model. You may have noticed, when creating these models we often take similer approaches with minor differences. This is good because the similer the approach the easier it is to learn these distinct models.

Questionnaire

How could multi-label classification improve the usability of the bear classifier?
It would be able to classify more bears in the image. Also, it would allow for the classification of no bears present.
How do we encode the dependent variable in a multi-label classification problem?
It is one-hot encoded: Here the vector is the same length as the vocab. 0 and 1 are used to repersent if a class if present.
How do you access the rows and columns of a DataFrame as if it was a matrix?
df.iloc[0][1] - row 0, col 1
How do you get a column by name from a DataFrame?
df['colName']
What is the difference between a Dataset and DataLoader?
Dataset returns tuples of x, y
DataLoader is an extention of Dataset, here it return minibatches of x,y
What does a Datasets object normally contain?
Training and validation set
What does a DataLoaders object normally contain?
Training dataloader and validation dataloader
What does lambda do in Python?
Lambda is an anonymous function that can be created on the spot. Do note that they are not serializable, however.
What are the methods to customize how the independent and dependent variables are created with the data block API?
get_x
get_y
Why is softmax not an appropriate output activation function when using a one hot encoded target?
Softmax forces model to pick only one class
Why is nll_loss not an appropriate loss function when using a one-hot-encoded target?
Similer to Softmax, this works better when you want only one class
What is the difference between nn.BCELoss and nn.BCEWithLogitsLoss?
BCELoss assumes you did sigmoid prior
BCEWithLogitsLoss does sigmoid
Why can't we use regular accuracy in a multi-label problem?
Regular accuracy assumes that only 1 class is correct. However, in multi-label problems there can be multiple labels, so a threshold is assigned.
When is it okay to tune a hyperparameter on the validation set?
When the hyper-parameter and the metric being observed is smooth
How is y_range implemented in fastai? (See if you can implement it yourself and test it without peeking!)
```
def sigmoid_range(x,lo, hi):
     return x.sigmoid() * (hi-lo) + lo
```
What is a regression problem? What loss function should you use for such a problem?
Dependent values are continuous. The loss functions used often is mean squared error loss.
What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?
You must used the correct DataBlock, PointBlock.

Further Research

Read a tutorial about Pandas DataFrames and experiment with a few methods that look interesting to you. See the book's website for recommended tutorials.
Retrain the bear classifier using multi-label classification. See if you can make it work effectively with images that don't contain any bears, including showing that information in the web application. Try an image with two different kinds of bears. Check whether the accuracy on the single-label dataset is impacted using multi-label classification.
Completed, see here [https://usama280.github.io/PasteBlogs/]

epoch	train_loss	valid_loss	accuracy_multi	time
0	0.950922	0.694729	0.243845	00:31
1	0.830578	0.568214	0.294123	00:26
2	0.609650	0.201799	0.818108	00:26
3	0.364655	0.126312	0.939741	00:26

epoch	train_loss	valid_loss	accuracy_multi	time
0	0.135176	0.116987	0.942052	00:32
1	0.115532	0.106409	0.954303	00:32
2	0.096683	0.103882	0.950458	00:32

epoch	train_loss	valid_loss	time
0	0.008197	0.002675	04:41
1	0.003693	0.000182	04:40
2	0.001383	0.000099	04:41

Lesson 6 - FastAI

Multi-Label Classification

The Data

Sidebar: Pandas (pd) and DataFrames

Exploring Dataloader and Datasets

End Sidebar

Constructing a DataBlock

Must create our own independent and dependent var

Lets add the respective blocks (ImageBlock, MultiCategoryBlock)

Viewing data

Now lets split the data into training and valid

Finally, lets switch the dataset out for a dataloader

Loss function for multi-label (Binary Cross-Entropy)

Accuracy

Sidebar: Partial example

End Sidebar

Training

Next model - Regression

Assemble the Data

Required datablock

Training a Model

Conclusion

Questionnaire

Further Research

	fname	labels	is_valid
0	000005.jpg	chair	True
1	000007.jpg	car	True
2	000009.jpg	horse person	True
3	000012.jpg	car	False
4	000016.jpg	bicycle	True