Multi-Label Classification

So far we have been doing classfication for various images, but often pictures have more than one object present within them. For this reason, we will now cover multi-label classification. Our goal is to create a model that can classify all objects within an image.

The Data

path = untar_data(URLs.PASCAL_2007)

Path.BASE_PATH = path
path.ls()
(#8) [Path('train'),Path('test.json'),Path('segmentation'),Path('train.json'),Path('valid.json'),Path('test.csv'),Path('train.csv'),Path('test')]
df = pd.read_csv(path/'train.csv')
df.head()
fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True

Sidebar: Pandas (pd) and DataFrames

df.iloc[:,0] #column 0 
0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
           ...    
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object
df.iloc[0,:] #row 0
# Trailing :s are always optional (in numpy, pytorch, pandas, etc.),
#   so this is equivalent:
df.iloc[0]
fname       000005.jpg
labels           chair
is_valid          True
Name: 0, dtype: object
df['fname']
0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
           ...    
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object
tmp_df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
tmp_df
a b
0 1 3
1 2 4
tmp_df['c'] = tmp_df['a']+tmp_df['b']
tmp_df
a b c
0 1 3 4
1 2 4 6

Exploring Dataloader and Datasets

a = list(enumerate(string.ascii_lowercase))
a[0], len(a)
((0, 'a'), 26)
dl_a = DataLoader(a, batch_size=8, shuffle=True)
first(dl_a)
(tensor([17, 18, 10, 22,  8, 14, 20, 15]),
 ('r', 's', 'k', 'w', 'i', 'o', 'u', 'p'))
a = list((string.ascii_lowercase))
dss = Datasets(a)
dss[0]
('a',)

End Sidebar

Constructing a DataBlock

You may have noticed, we often use a datablock when processing data. Why don't we create our own datablock from scratch?

dblock = DataBlock()
dsets = dblock.datasets(df) #This creates a train and valid set
len(dsets.train),len(dsets.valid)
(4009, 1002)
x,y = dsets.train[0]
x,y
(fname       004719.jpg
 labels         bus car
 is_valid          True
 Name: 2369, dtype: object,
 fname       004719.jpg
 labels         bus car
 is_valid          True
 Name: 2369, dtype: object)

Notice that the x and y are identical! This means we need to create our own independent and dependent var

Must create our own independent and dependent var

x['fname'] #This is our independent var
'004719.jpg'
y['labels'] #This is our dependent var
'bus car'
dblock = DataBlock(get_x = lambda r: r['fname'], 
                   get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]

#Awesome it works, just one more thing, we need the path
('004849.jpg', 'train')
def get_x(r): return r['fname']
def get_y(r): return r['labels']

dblock = DataBlock(
    get_x = get_x, 
    get_y = get_y)

dsets = dblock.datasets(df)
dsets.train[0]
('004069.jpg', 'bird')

This is good but we need the path and need to index the labels

def get_x(r): return path/'train'/r['fname'] #Need path
def get_y(r): return r['labels'].split(' ') #Split into each index

dblock = DataBlock(
    get_x = get_x, 
    get_y = get_y)

dsets = dblock.datasets(df)
dsets.train[0]

#looks good
(Path('train/003973.jpg'), ['bus', 'car', 'person'])

Looks good

Lets add the respective blocks (ImageBlock, MultiCategoryBlock)

To actually open the image and do the conversion to tensors, we will need to use blocks. Our independent block is obviously the images so ImageBlock. Our dependent block must be vary of more than one category so MultiCategoryBlock.

def get_x(r): return path/'train'/r['fname'] #Need path
def get_y(r): return r['labels'].split(' ') #Split into each index

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock), #Adding blocks
                   get_x = get_x, 
                   get_y = get_y)

dsets = dblock.datasets(df)
x,y = dsets.train[0]

Viewing data

x
y
TensorMultiCategory([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
dsets.vocab
['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
dsets.train[0][1] # Also known as y (Stored above)
TensorMultiCategory([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
idxs = torch.where(dsets.train[0][1]==1)[0]
dsets.vocab[idxs]
(#1) ['bird']

Now lets split the data into training and valid

def splitter(df):
    train = df.index[~df['is_valid']].tolist() # ~ means NOT
    valid = df.index[df['is_valid']].tolist()
    return train,valid

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y)

dsets = dblock.datasets(df)
dsets.train[0]
(PILImage mode=RGB size=500x333,
 TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))

Finally, lets switch the dataset out for a dataloader

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y,
                   item_tfms = RandomResizedCrop(128, min_scale=0.35)) #Must make all images same size

dls = dblock.dataloaders(df) #Switching to dataloaders
dls.show_batch(nrows=1, ncols=3)

Loss function for multi-label (Binary Cross-Entropy)

The loss function, as you may have expected, has changed once again. Because we are dealing with multiple labels, we need to use binary cross entropy.

learn = cnn_learner(dls, resnet18)
x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x) #Can view activations on a single batch
activs.shape #64 imgs with 20 actv each
torch.Size([64, 20])
activs[0]
TensorImage([ 2.1040,  3.5137, -1.8048,  4.0140, -2.7668,  5.4573,  0.4210, -2.2953,  2.8784, -0.6523,  1.9214, -2.0250, -0.2120, -1.9046, -1.1947, -0.1867, -0.9368,  0.4383, -1.1167, -0.8710],
       grad_fn=<AliasBackward>)
def binary_cross_entropy(inputs, targets):
    inputs = inputs.sigmoid()
    return -torch.where(targets==1, inputs, 1-inputs).log().mean()
loss_func = nn.BCEWithLogitsLoss() #Same thing as the func above
loss = loss_func(activs, y)
loss
TensorImage(1.0545, grad_fn=<AliasBackward>)

Accuracy

One change compared to the last chapter is the metric we use: because this is a multilabel problem, we can't use the accuracy function. Why is that? Well, accuracy was comparing our outputs to our targets like so:

def accuracy(inp, targ, axis=-1):
    "Compute accuracy with `targ` when `pred` is bs * n_classes"
    pred = inp.argmax(dim=axis)
    return (pred == targ).float().mean()

The class predicted was the one with the highest activation (this is what argmax does). Here it doesn't work because we could have more than one prediction on a single image. After applying the sigmoid to our activations (to make them between 0 and 1), we need to decide which ones are 0s and which ones are 1s by picking a threshold. Each value above the threshold will be considered as a 1, and each value lower than the threshold will be considered a 0:

def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    "Compute accuracy when `inp` and `targ` are the same size."
    if sigmoid: inp = inp.sigmoid()
    return ((inp>thresh)==targ.bool()).float().mean()

Sidebar: Partial example

Showcasing how partial works in python

def say_hello(name, say_what="Hello"): return f"{say_what} {name}."

say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')
('Hello Jeremy.', 'Ahoy! Jeremy.')
f = partial(say_hello, say_what="Bonjour")

f("Jeremy"),f("Sylvain")
('Bonjour Jeremy.', 'Bonjour Sylvain.')

End Sidebar

Training

learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
epoch train_loss valid_loss accuracy_multi time
0 0.950922 0.694729 0.243845 00:31
1 0.830578 0.568214 0.294123 00:26
2 0.609650 0.201799 0.818108 00:26
3 0.364655 0.126312 0.939741 00:26
epoch train_loss valid_loss accuracy_multi time
0 0.135176 0.116987 0.942052 00:32
1 0.115532 0.106409 0.954303 00:32
2 0.096683 0.103882 0.950458 00:32

95% acc with a treshold of .2, lets try some more

learn.metrics = partial(accuracy_multi, thresh=0.1) #Try different thresh
learn.validate()
(#2) [0.1038823127746582,0.9292030334472656]
learn.metrics = partial(accuracy_multi, thresh=0.99)
learn.validate()
(#2) [0.1038823127746582,0.9444024562835693]
preds,targs = learn.get_preds()
accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)
TensorImage(0.9584)
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);

Next model - Regression

Now that we have taken a look at multi label model, lets take a look at a regression model!

Assemble the Data

path = untar_data(URLs.BIWI_HEAD_POSE)
path.ls().sorted()
(#50) [Path('01'),Path('01.obj'),Path('02'),Path('02.obj'),Path('03'),Path('03.obj'),Path('04'),Path('04.obj'),Path('05'),Path('05.obj')...]
(path/'01').ls().sorted()
(#1000) [Path('01/depth.cal'),Path('01/frame_00003_pose.txt'),Path('01/frame_00003_rgb.jpg'),Path('01/frame_00004_pose.txt'),Path('01/frame_00004_rgb.jpg'),Path('01/frame_00005_pose.txt'),Path('01/frame_00005_rgb.jpg'),Path('01/frame_00006_pose.txt'),Path('01/frame_00006_rgb.jpg'),Path('01/frame_00007_pose.txt')...]
img_files = get_image_files(path) #jpg's

im = PILImage.create(img_files[0])
im.shape
(480, 640)
im.to_thumb(160)
def img2pose(x): 
    return Path(f'{str(x)[:-7]}pose.txt') 
img2pose(img_files[0])
Path('06/frame_00554_pose.txt')
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
#Given func to get center of face
def get_ctr(f):
    ctr = np.genfromtxt(img2pose(f), skip_header=3)
    c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
    c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
    return tensor([c1,c2])
get_ctr(img_files[0])
tensor([330.9554, 308.3703])

Required datablock

biwi = DataBlock(
    blocks=(ImageBlock, PointBlock),
    get_items=get_image_files,
    get_y=get_ctr,
    splitter=FuncSplitter(lambda o: o.parent.name=='13'), #Just grabbing person number 13 as our valid set
    batch_tfms=[*aug_transforms(size=(240,320)), 
                Normalize.from_stats(*imagenet_stats)]
)
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
xb,yb = dls.one_batch()
xb.shape,yb.shape
(torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))
yb[0]
TensorPoint([[0.1999, 0.0785]], device='cuda:0')

Training a Model

learn = cnn_learner(dls, resnet18, y_range=(-1,1)) #range states what values are within the acceptable range 
                                                   # -1 is far-left and bottom, 1 is far-right and top
def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo 
plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)
/opt/conda/envs/fastai/lib/python3.8/site-packages/fastbook/__init__.py:73: UserWarning: Not providing a value for linspace's steps is deprecated and will throw a runtime error in a future release. This warning will appear only once per process. (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/RangeFactories.cpp:23.)
  x = torch.linspace(min,max)
dls.loss_func 
FlattenedLoss of MSELoss()
learn.lr_find()
SuggestedLRs(lr_min=0.005754399299621582, lr_steep=1.5848931980144698e-06)
lr = 1e-2
learn.fine_tune(3, lr)
epoch train_loss valid_loss time
0 0.049630 0.005448 03:34
epoch train_loss valid_loss time
0 0.008197 0.002675 04:41
1 0.003693 0.000182 04:40
2 0.001383 0.000099 04:41
math.sqrt(0.0001) #error
0.01
learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))

Conclusion

Today you learned how to create both a multi-label and regression model. You may have noticed, when creating these models we often take similer approaches with minor differences. This is good because the similer the approach the easier it is to learn these distinct models.

Questionnaire

  1. How could multi-label classification improve the usability of the bear classifier?
    It would be able to classify more bears in the image. Also, it would allow for the classification of no bears present.
  2. How do we encode the dependent variable in a multi-label classification problem?
    It is one-hot encoded: Here the vector is the same length as the vocab. 0 and 1 are used to repersent if a class if present.
  3. How do you access the rows and columns of a DataFrame as if it was a matrix?
    df.iloc[0][1] - row 0, col 1
  4. How do you get a column by name from a DataFrame?
    df['colName']
  5. What is the difference between a Dataset and DataLoader?
    Dataset returns tuples of x, y
    DataLoader is an extention of Dataset, here it return minibatches of x,y
  6. What does a Datasets object normally contain?
    Training and validation set
  7. What does a DataLoaders object normally contain?
    Training dataloader and validation dataloader
  8. What does lambda do in Python?
    Lambda is an anonymous function that can be created on the spot. Do note that they are not serializable, however.
  9. What are the methods to customize how the independent and dependent variables are created with the data block API?
    get_x
    get_y
  10. Why is softmax not an appropriate output activation function when using a one hot encoded target?
    Softmax forces model to pick only one class
  11. Why is nll_loss not an appropriate loss function when using a one-hot-encoded target?
    Similer to Softmax, this works better when you want only one class
  12. What is the difference between nn.BCELoss and nn.BCEWithLogitsLoss?
    BCELoss assumes you did sigmoid prior
    BCEWithLogitsLoss does sigmoid
  13. Why can't we use regular accuracy in a multi-label problem?
    Regular accuracy assumes that only 1 class is correct. However, in multi-label problems there can be multiple labels, so a threshold is assigned.
  14. When is it okay to tune a hyperparameter on the validation set?
    When the hyper-parameter and the metric being observed is smooth
  15. How is y_range implemented in fastai? (See if you can implement it yourself and test it without peeking!)
    def sigmoid_range(x,lo, hi):
         return x.sigmoid() * (hi-lo) + lo
    
  16. What is a regression problem? What loss function should you use for such a problem?
    Dependent values are continuous. The loss functions used often is mean squared error loss.
  17. What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?
    You must used the correct DataBlock, PointBlock.

Further Research

  1. Read a tutorial about Pandas DataFrames and experiment with a few methods that look interesting to you. See the book's website for recommended tutorials.
  2. Retrain the bear classifier using multi-label classification. See if you can make it work effectively with images that don't contain any bears, including showing that information in the web application. Try an image with two different kinds of bears. Check whether the accuracy on the single-label dataset is impacted using multi-label classification.
    Completed, see here [https://usama280.github.io/PasteBlogs/]