Tutorial 2: Classical Machine Learning Fundamentals

Laura E. Boucheron, Electrical & Computer Engineering, NMSU

October 2020

Copyright (C) 2020 Laura E. Boucheron

This information is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this work in a file COPYING.TXT; if not, If not, see https://www.gnu.org/licenses/.

Overview

In this tutorial, we present a brief overview of classical machine learning concepts as applicable to image classification applications. Completion of this tutorial should give participants the basic background and terminology necessary for an understanding of the basics of classical machine learning as applied to image classification. In this tutorial, we will develop a classical machine learning algorithm capable of discriminating between objects present in an image.

This tutorial contains 5 sections:

There are subsections with the heading " Your turn: " throughout this tutorial in which you will be asked to apply what you have learned.

Section 0: Preliminaries

Section 0.1: A Note on Jupyter Notebooks

There are two main types of cells in this notebook: code and markdown (text). You can add a new cell with the plus sign in the menu bar above and you can change the type of cell with the dropdown menu in the menu bar above. As you complete this tutorial, you may wish to add additional code cells to try out your own code and markdown cells to add your own comments or notes.

Markdown cells can be augmented with a number of text formatting features, including

embedded $\LaTeX$, monotype specification of code syntax, bold font, and italic font. There are many other features of markdown cells--see the jupyter documentation for more information.

You can edit a cell by double clicking on it. If you double click on this cell, you can see how to implement the various formatting referenced above. Code cells can be run and markdown cells can be formatted using Shift+Enter or by selecting the Run button in the toolbar above.

Once you have completed (all or part) of this notebook, you can share your results with colleagues by sending them the .ipynb file. Your colleagues can then open the file and will see your markdown and code cells as well as any results that were printed or displayed at the time you saved the notebook. If you prefer to send a notebook without results displayed (like this notebook appeared when you downloaded it), you can select ("Restart & Clear Output") from the Kernel menu above. You can also export this notebook in a non-executable form, e.g., .pdf through the File, Save As menu.

Section 0.2 Downloading Images

In this tutorial, we will use the CalTech101 dataset, which is a standard dataset used for image classification. You can find important information about this dataset at (http://www.vision.caltech.edu/Image_Datasets/Caltech101/). From that webpage, download the dataset itself (http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz) (126 MB) and also the annotations (http://www.vision.caltech.edu/Image_Datasets/Caltech101/Annotations.tar) (13 MB) which will allow us to focus our feature extraction on only the objects in the images.

Extract the image dataset and the annotations in your working directory. The images will extract to a 101_ObjectCategories/ directory, under which there are 102 directories named according to the object contained in the image (e.g., accordion/ or pizza/), under which are files with file format image_XXXX.jpg, where XXXX is a four digit number. The annotations will extract to an Annotations/ directory, underneath which there are 101 directories named the same categories (for the most part) as the 101_ObjectCategories/ categories, under which are files annotation_XXXX.mat, where XXXX is a four digit number. There are also 5 other files in the Annotations/ directory. In order to make subsequent code run more easily:

Section 0.3a Import Necessary Libraries (For users using a local machine)

Here, at the top of the code, we import all the libraries necessary for this tutorial. We will introduce the functionality of any new libraries throughout the tutorial, but include all import statements here as standard coding practice. We include a brief comment after each library here to indicate its main purpose within this tutorial.

It would be best to run this next cell before the workshop starts to make sure you have all the necessary packages installed on your machine.

Section 0.3b Build the Conda Environment (For users using the ARS HPC Ceres with JupyterLab)

Open a terminal from inside JupyterLab (File > New > Terminal) and type the following commands

source activate
wget https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/aiworkshop.yml
conda env create --prefix /project/your_project_name/envs/aiworkshop -f aiworkshop.yml

This will build the environment in one of your project directories. It may take 5 minutes to build the Conda environment.

See https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/setup/ for more information.

When the environment finishes building, select this environment as your kernel in your Jupyter Notebook (click top right corner where you see Python 3, select your new kernel from the dropdown menu, click select)

You will want to do this BEFORE the workshop starts.

Section 1: Working with the CalTech101 Dataset

Section 1.1: Exploring the Images

In the previous tutorial, we were working with only two images. There are more than 8000 images in 101 different directories in the CalTech101 dataset. We thus need to develop ways to efficiently loop over larger image datasets and access the images without hard coding the image filenames.

Here, we will use the glob library to store directory names and filenames in a list. You can store the directory names of the CalTech101 dataset in a list with categories=sorted(glob.glob('101_ObjectCategories/*'). This list now gives you a means to loop over the 101 different categories of objects in that categories[k] is the k-th category name as a string (including the string 101_ObjectCategories/ prepended to the category name). A few other notes:

Using this list categories, we can read in the first image (image_0001.jpg) from each of the 101 categories and display that image in one location of an $11\times10$ subplot. We can also title each of those locations of the subplot with the category name. We note that this code is not robust in the sense that we are relying on the existence of a specific filename format under each of the directories in 101_ObjectCategories. We will work with more robust means to traverse the files that exist in each directory in later portions of this tutorial.

Section 1.2 Exploring the Annotations

Section 1.2.1 Plotting the annotations as a boundary over the image

The annotations are stored in Matlab's .mat format, which the scipy.io library in python can load. Above, we have imported scipy.io as spio. The image annotations can be read in with the spio.loadmat function, e.g., ann=spio.loadmat('filename.mat'). The spio.loadmat function returns a dictionary with variable names as keys. In the CalTech101 annotations, dictionary entry ann['box_coord'] is a $1\times4$ vector of bounding box coordinates and ann['obj_contour'] is a $2\times K$ vector of pixel locations which outline the contour of the object, where $K$ will be different for different annotations.

As an example, we read in Annotations/emu/annotation_0001.mat and display box_coord and obj_contour. The object contour points obj_contour are (for reasons unbeknownst to us) offset by the upper left box_coord coordinates.

As a further example, we read in the corresponding image 101_ObjectCategories/emu/image_0001.jpg and display it. On top of that image, we plot the annotation outline with a basic plot command plt.plot(ann['obj_contour'][0,:]+ann['box_coord'][0,2]-1,ann['obj_contour'][1,:]+ann['box_coord'][0,0]-1,'w'). A few notes:

Section 1.2.2 Some common coordinate issues that may be encountered with annotations

We noted above that since we first displayed the image using plt.imshow, the axes for the figure are assumed to have the origin in the top left. The plt.plot command will use the plotting coordinate conventions of x-axis, y-axis, but will follow the origin set up by the image visualization. We further explore this issue by using the same plotting command plt.plot(ann['obj_contour'][0,:]+ann['box_coord'][0,2]-1,ann['obj_contour'][1,:]+ann['box_coord'][0,0]-1,'r') as above, but without first visualizing the image. This means that the plt.plot command is expected to use the plotting coordinate conventions of x-axis, y-axis and have the origin in the bottom left.

Reversing coordinates

A very common mistake in plotting (x,y) coordinates on top of images is accidentally reversing the order of the coordinates. Given the rotated coordinate system used for images, this can cause a common "rotation" of expected results. If we accidentally plotted the annotation in row, column order, we would achieve something like follows.

Section 1.2.3 Computing a binary object mask from the annotation data

You can use the object contour outline to define a binary image image mask with r,c = skimage.draw.polygon(ann['obj_contour'][1,:]+ann['box_coord'][0,0]-1,ann['obj_contour'][0,:]+ann['box_coord'][0,2]-1,(M,N)); A=np.zeros(M,N); A[r,c]=1; (note that the object contour indices are swapped here versus the plot command used above due to the difference in coordinate systems of image versus plot) where M, N are the dimensions of the image.

Your turn:

Using what you have learned about using lists to loop over categories, load the first annotation (annotation_0001.mat) from each of the 101 categories, use the corresponding obj_contour to define an object mask, and display that mask in one location of an $11\times10$ subplot. Title each of those locations of the subplot with the category name. You might find it handy to read in the image corresponding to the annotation in order to easily get the dimensions. The visualizations from the previous part can be used here to spot-check the correctness of the annotations.

Section 2: Feature Extraction

In this section we will define several functions designed to extract different categories of features from images. These functions will span several common categories of features, but are by no means a comprehensive list. These feature extraction methods are illustration of so-called "hand-designed" features. These are features that are specifically implemented as features that are expected to be helpful for discriminating between different image categories.

Section 2.1 Color Features

In this section, we will extract a set of features designed to charaterize the colors present in an image. We use the annotation mask as defined above to focus our attention on features only within the object of interest rather than features of the entire image.

Section 2.1.1 Defining color statistics

Here we create a function f,fnames=extract_color_features_rgb(im,mask) with inputs im, the image from which to extract features, and the binary annotation mask, mask. Outputs will be a length-15 feature vector f describing statistics of the colors within the image object and a length-15 list fnames with the feature names. We extract statistics from the red, green, and blue channels of the image. From each channel, we compute the mean, standard deviation, median, min, and max value of pixels within the object mask. We order the features by channel first in the order given above and by statistic second in the order given above (i.e., the first and second features will be mean and standard deviation of the red channel). We assign brief, descriptive strings for each feature and store those in fnames (e.g., 'R_mean', and 'R_std' as names for the first two features). Note that we also need to take care of the situation in which the image is a grayscale image, i.e., only one channel by using the skimage.color.gray2rgb function to convert to an RGB image.

Section 2.1.2 Extracting color statistics

Using 101_ObjectCategories/emu/image_0001.jpg as the input image im and Annotations/emu/annotation_0001.mat as the annotation mask mask, we use the extract_color_features_rgb function and print out the f vector and the fnames list. These features may not mean much to us as printed, but such a printed output can be used as a sanity check.

Your turn:

Create a feature extraction function f,fnames=extract_color_features_hsv(im,mask) with inputs im, the image from which to extract features, and the binary annotation mask, mask. Outputs will be a length-15 feature vector f describing statistics of the colors in HSV space within the image object and a length-15 list fnames with the feature names. Extract statistics from the hue, saturation, and value channels of the image. From each channel, compute the mean, standard deviation, median, min, and max value of pixels within the object mask. In order to convert between the RGB and HSV color space, use the command skimage.color.rgb2hsv. Order the features by channel first in the order given above and by statistic second in the order given above (i.e., the first and second features will be mean and standard deviation of the hue channel). Assign brief, descriptive strings for each feature and store those in fnames (e.g., 'H_mean', and 'H_std' as names for the first two features).

Section 2.2 Region features

In this section, we will extract a set of features designed to characterize the size and shape of an image object. We use the annotation mask as defined above to define the object of interest.

Section 2.2.1: Defining region features

We will use the skimage.measure.regionprops function to compute a list of region-based features in the extract_region_features function below. We will not use all of the features available in skimage.measure.regionprops because some of those features may not be useul in our image classification situation. For example, the centroid of the object or the orientation of the object may bias the classifier to translation or rotation variance. In all subsequent discussion, the term "region" is used to denote the annotated region in an image. The 19 features extracted below are measures of region characteristics of a region, including:

Section 2.2.2: Extracting region features

Using 101_ObjectCategories/emu/image_0001.jpg as the input image im and Annotations/emu/annotation_0001.mat as the annotation mask mask, we use the extract_region_features_try1 function and print out the f vector and the fnames list. Depending on your version of python, you may get a deprecation warning when running the following code. That deprecation warning is related to the issue that you will explore in the next Your turn: block.

Your turn:

We are designing functions that can extract a vector of features from image regions. What issue do you note with the feature vector that is returned by extract_region_features_try1?

The feature vector is of type "object" indicating that it is not a simple feature vector. The vector feature from the Hu moments have not been appended to the feature vector as individual elements.

Your turn:

Here is a modification to the region feature extraction code called simply extract_region_features. Use this function to compare and contrast the output to the output from extract_region_features_try1.

Now we notice that the feature vector is a true vector since we have individually appended each of the 7 Hu moments. Since there were only 7, we expicitly typed out all seven vector elements and feature names, but note that we could use iteration for longer feature vectors.

Section 2.3: Texture features

In this section, we will extract a set of features designed to characterize the textures of intensities in an image. Texture measures characterize the spatial distribution of intensities in an image. If we think of a grayscale image as a surface where the lighter regions are raised higher than the darker regions, the distribution of those intensities would manifest as different texures if you were to run your finger across the image. Again, we use the annotation mask as defined above to focus our attention on features only within the object of interest rather than features of the entire image.

Section 2.3.1: Defining texture features

We create a function f,fnames=extract_texture_features(im,mask) with inputs im, the image from which to extract features, and the binary annotation mask, mask. This function makes use of the gray-level co-occurrence matrix (GLCM) which is a common method to extract texture features from an image. The outputs are a length-48 feature vector f of co-occurrence matrix features within the image object and a length-48 list fnames with the feature names.

Section 2.3.2: Extracting texture features

Using 101_ObjectCategories/emu/image_0001.jpg as in the input image im and Annotations/emu/annotation_0001.mat as the annotation mask mask, we use the extract_texture_features function and print out the f vector and the fnames list.

Section 3: Setting up a Feature Matrix and Label Vector

Now that we have defined functions that compute several different categories of features from an image object, we need to aggregate those features into a feature matrix. This feature matrix will be $N\times M$ where $N$ is the total number of images that we use as input and $M$ is the total number of features that we extract from each of the $N$ images. If we use all features from above we have a total of 97 features for each image (97 = 15 RGB features + 15 HSV features + 19 region features + 48 texture features). This feature matrix is used as input to the classification algorithm to describe the image objects.

The classification algorithm, however, also needs to be told what the label of each image is so that it can learn to discriminate the different objects. The label vector will be an $N\times 1$ vector. Note that the number of rows $N$ in the feature matrix must correspond to the length $N$ of the label vector and there must be a one-to-one correspondence, i.e., the first row of the feature matrix must correspond to the first element in the label vector. This label vector provides the identity (label) of each image. There are different means to define labels for machine learning algorithms. This example will be specific to the sklearn package in python, but will be similar in flavor to necessary format for other frameworks. We will learn a different formulation of the label vector for deep learning in Tutorial 3.

Section 3.1: Setting up a matrix to discriminate between flamingos and emus

In this part, we use what we learned from Section 1 above about looping over the directory structure of the CalTech101 dataset. We will loop over multiple images, extract features, and build a feature matrix and label vector. We write this code so that the user can specify the categories of interest as a list of strings. Those strings are used to navigate into the directories of images from which to extract features. Feature vectors f_rgb, f_hsv, f_region, and f_texture are extracted from each image and stacked in an $N\times97$ feature matrix, where $N$ is the total number of images, and 97 is the feature vector dimensionality. At the same time, we create a corresponding $N\times1$ label vector (actually a list in python).

While we loop over all images in the specified categories, we split the data into a training set consisting of 90% of the data and a test set consisting of the remaining 10%. We call the two feature matrices X_train and X_test and the two label vectors, y_train and y_test, consistent with common notation in machine learning. In this case, the label vectors y_train and y_test are actually lists of the class strings (e.g., 'emu').

Here, as an example, we specify the 'emu' and 'flamingo' directories, compute X_train, X_test, y_train, and y_test.

Your turn:

Explore the dimensionalities and values of X_train, X_test, y_train, and y_test.

Section 3.2: Normalizing the feature matrices

Some of the features have a larger range than others. We don’t want those features to have undue influence on the classification. We will thus normalize the feature matrices to have range [0,1]. There will be two slightly different procedures for normalizing X_train and X_test.

To normalize X_train, from each column we subtract the minimum of the column and divide by the maximum of the column. Additionally, we save the maximum values for each column in a $1\times97$ vector mx and the minimum values for each column in a $1\times97$ vector mn.

To normalize X_test, from each column we subtract the corresponding minimum from mn and divide by the corresponding maximum from mx. This procedure treats the test data exactly the same as the training data.

Your turn:

For the same X_train, X_test as in Section 3.1, compute the normalized matrices Xn_train, Xn_test. Explore the dimensionalities and values of Xn_train and Xn_test and compare to what you found above for X_train and X_test.

Section 4: Classification

In this section we will use the support vector machine (SVM) classifier from sklearn as an example for how you can use the training data in X_train and y_train to train a classifier. We we also use other supporting functions from sklearn to assess the performance of the SVM on the test data X_test. The basic setup of the training and testing process for the SVM will be easily transferred to application of other common classifiers available in sklearn.

We will also explore modifications to the training process to explore some of the discriminative capabilities of the features we have extracted. Finally, you will explore other standard classifiers available in sklearn.

Section 4.1: Training the SVM Classifier

The commands here assume that we will be training a binary (two-class) classifier svm.SVC. We first declare the SVM which is the step where we can configure various parameters of the SVM. Next, we fit the SVM to the data. You will notice that the fitting routine prints out a bunch of information about the classifier that was trained. That information gives us some idea about the different configuration parameters available in the SVM classifier.

Section 4.2: Testing the SVM Classifier

Now that we have trained the classifier by showing it the training data, we will test your classifier by predicting the labels for the test data. We call the predicted labels y_test_hat where the _hat is in nod to the typical mathematical notation for an estimate. Now that we have the predicted class labels y_test_hat, we compare them to the known class labels in y_test. Here, we use two metrics to help us interpret the performance: the confusion matrix and the accuracy. There are many other metrics available, see the documentation for sklearn at https://scikit-learn.org/stable/user_guide.html.

The confusion matrix is a matrix of $L\times L$ where $L$ is the number of classes. The $(i,j)$-th entry is a count of the number of times an actual class $i$ is predicted to be class $j$. Thus, a perfect prediction will have a diagonal confusion matrix. We also send in the list of category names to specify the order in which the classes appear in the confusion matrix.

We compute the overall classification accuracy from the confusion matrix by summing the diagonal of C (the number of correct classifications) by the total of C (the total number of training samples)

Your turn:

What does this confusion matrix and accuracy tell you about the performance of the SVM classifier?

Section 4.3 Training a multi-class classifier

We can use the same svm.SVC classifier for a multi-class (more than two classes) classification problem. Many, but not all classifiers can be applied to binary and multi-class problems.

Your turn:

Use what you learned above to create a three-class classifier using input from the CalTech101 dataset. The basic two-class code is copied into the cell below for ease of editing.

Section 4.4 Exploring discriminative capabilities of different features

We can train an SVM using only a subset of the features that we have defined. This is essentially an exploration of the discriminatory potential of different individual features or sets of features via ablation. In the code below, we re-compute the feature matrices and label vectors for the 'emu' versus 'flamingo' problem. Since we will be using subsets of features, we extract all features here and will use slicing to send a subset of features to the SVM classifier.

Your turn:

Choose two or more categories from the CalTech101 dataset that you think might be more or less amenable to discrimination using certain feature subsets. Using those categories, explore the discriminative capabilities of different feature subsets. The basic code for using color features only for the 'emu' versus 'flamingo' classification problem is copied into the cell below for ease of editing.

Section 4.5 Other Classifiers

There are many other classifiers available in the sklearn package, see https://scikit-learn.org/stable/user_guide.html for documentation.

Your turn:

Explore the capabilities of other classifiers. If you don't know where to start, some commonly referenced classifiers in the literature are