Tutorial 1: Image Processing Fundamentals

Laura E. Boucheron, Electrical & Computer Engineering, NMSU

October 2020

Copyright (C) 2020 Laura E. Boucheron

This information is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this work in a file COPYING.TXT; if not, see https://www.gnu.org/licenses/.

Overview

In this tutorial, we present a brief overview of image processing concepts necessary to understand machine learning and deep learning. Completion of this tutorial should give participants the basic background and terminology necessary for an understanding of the basics of image processing and the common manipulations of images used for machine learning and deep learning.

This tutorial contains 5 sections:

There are subsections with the heading Your turn: throughout this tutorial in which you will be asked to apply what you have learned.

Section 0: Preliminaries

Section 0.1 A Note on Jupyter Notebooks

There are two main types of cells in this notebook: code and markdown (text). You can add a new cell with the plus sign in the menu bar above and you can change the type of cell with the dropdown menu in the menu bar above. As you complete this tutorial, you may wish to add additional code cells to try out your own code and markdown cells to add your own comments or notes.

Markdown cells can be augmented with a number of text formatting features, including

embedded $\LaTeX$, monotype specification of code syntax, bold font, and italic font. There are many other features of markdown cells--see the jupyter documentation for more information.

You can edit a cell by double clicking on it. If you double click on this cell, you can see how to implement the various formatting referenced above. Code cells can be run and markdown cells can be formatted using Shift+Enter or by selecting the Run button in the toolbar above.

Once you have completed (all or part) of this notebook, you can share your results with colleagues by sending them the .ipynb file. Your colleagues can then open the file and will see your markdown and code cells as well as any results that were printed or displayed at the time you saved the notebook. If you prefer to send a notebook without results displayed (like this notebook appeared when you downloaded it), you can select ("Restart & Clear Output") from the Kernel menu above. You can also export this notebook in a non-executable form, e.g., .pdf through the File, Download As or File, Export Notebook as menu.

Section 0.2 Downloading Images

First, we need to download images to work with in this tutorial. Download cameraman.png and peppers.png from the DL workshop website (https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/) and save them to the same directory as this notebook. Both of these images are common example images used in image processing and are often included as part of the distribution of image processing toolboxes.

Section 0.3a Import Necessary Libraries (For users using a local machine)

First, we import necessary libraries:

It would be best to run this next cell before the workshop starts to make sure you have all the necessary packages installed on your machine.

Section 0.3b Build the Conda Environment (For users using the ARS HPC Ceres with JupyterLab)

Open a terminal from inside JupyterLab (File > New > Terminal) and type the following commands

source activate
wget https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/aiworkshop.yml
conda env create --prefix /project/your_project_name/envs/aiworkshop -f aiworkshop.yml

This will build the environment in one of your project directories. It may take 5 minutes to build the Conda environment.

See https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/setup/ for more information.

When the environment finishes building, select this environment as your kernel in your Jupyter Notebook (click top right corner where you see Python 3, select your new kernel from the dropdown menu, click select)

You will want to do this BEFORE the workshop starts.

Section 1: Working with Grayscale Images

1.1 Reading in the image

We can read in the images using the imageio.imread command. We explicitly cast the image as an np array as this will give us access to some helpful characteristics of the image. We begin with the grayscale cameraman.png image.

1.2 Displaying the image

Let's display this image. We use the matplotlib imshow command.

A note about coordinate conventions

By default, axis labels are included which demarcate pixel counts. You may notice that the origin of an image is interpreted as the upper left corner and not the lower left corner as you might have expected. This is a consequence of the fact that we use standard linear algebra style indexing for images where pixel $(n,m)$ is indexed in row, column order. For those of you who might be particularly concerned, this coordinate system still describes a right-handed system.

This coordinate system can cause issues later on if you accidentally swap indices. You might think you are looking in the upper right but are actually looking in the lower left. You might think you are traversing left to right and are actually traversing up to down.

1.3 Changing display parameters

There are various choices in display that you can make, including:

Your turn:

Choose a figure size so that the image fills the width of your notebook and provide a descriptive title to your image. You may also choose to label your axes or not, per your preference. For what it's worth, image processing people don't tend to display axis labels.

1.4 Printing Image Characteristics

We can check on important characteristics of I_camera using the %whos magic ipython command. Note--within some environments, including jupyter notebooks, you can drop the % althought it's probably best practice to get used to including it.

1.4.1 Using the %whos command

A note on common image variable types

We see that I_camera is an ndarray of size $256\times256$ pixels and of variable type uint8 (unsigned 8-bit integer). Remember that computers store data natively in binary (base-2) format. The uint8 variable type means we have 8 bits (the '8' in uint8) to represent a range of positive (the 'u' in uint8) integers (the 'int' in uint8). It is very common that image pixels are represented as uint8 variables, which also indicates that the pixels are within the range $[0,255]$ (since $2^0-1=0$ and $2^8-1=255$).

Since there is only one color channel, i.e., I_camera is a 2D array $\in\mathbb{R}^{N\times M}$ rather than a 3D array $\in\mathbb{R}^{N\times M\times C}$ (more on that later), we also know that this is a grayscale image.

1.4.2 Printing the max and min values of an image

We can check for the actual maximum and minimum values of the image.

A note on image intensity conventions

We note that this I_camera image spans the range $[7,253]$. In grayscale images, it is common interpretation that darker pixels have smaller intensity values and lighter pixels have larger intensity values.

1.4.3 Printing a portion of the image

It is also important to remember that the computer "sees" only an array of values. To reinforce this, we can "look" at what the computer "sees" in a portion of the image.

Your turn:

What does this printout tell us about the structure in that part of the image?

There is a "stripe" of light-valued pixels (large intensity values) oriented at approximately 45 degrees through this portion of the image. On either side of that bright stripe, the image is very dark.

1.4.4 Visualizing a portion of an image

We could use plt.imshow to display that small portion of the image.

Your turn:

Does this display of the image verify your interpretation from the printout of the pixel values?

We note that the display of that portion of the image is consistent with our interpretation based solely on the pixel values.

1.4.5 Another visualization of a portion of an image

Here, we maintain the display of the whole image, and plot a yellow box around the area that we've been discussing. This can be a helpful visualization since it maintains the context of the box.

Your turn:

What happens if you plot the image using imshow but "forget" to specify the colormap as gray?

A note on colormaps

You should have found that the grayscale image now appears colored. How can that be if the image is a single channel, i.e., grayscale image? In this case, python is applying the default colormap to the intensities. In this default colormap, dark pixels appear dark blue, medium intensity pixels appear green or blue, and light pixels appear yellow. (Your computer may use a different default colormap in which case the colors noted above may not be correct).

You can choose any number of colormaps (see https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html for a comprehensive list and examples).

There are also many other options for plt.imshow, see help(plt.imshow) for more details.

Section 2: Working with Color Images

2.1 Reading in and displaying the image

Now, we turn to the color peppers.png image. We use the same command to read in the image and the same basic commands to visualize the image. The only difference here is that we allow python to choose a default colormap for this color image.

2.2 Printing image characteristics

We can check on important characteristics of I_pepper.

2.2.1 The %whos command

A note on color channel conventions

We see that I_pepper is an ndarray of size $384\times512\times 3$ pixels and of variable type uint8 (unsigned 8-bit integer). We thus have a 3-channel image where the three channels are assumed to be a red (R), green (G), and blue (B) channel, i.e., an RGB image. By convention, the first channel is assumed to be R, the second G, and the third B.

Again, we note that image pixels are represented as uint8 variables. In this case, however, each pixel is associated with 3 uint8 values, resulting in $2^8 2^8 2^8=2^{24}=16,777,216$ unique colors. Colors which have equal contribution from R, G, and B are grayscale.

2.2.2 Max and min values

We can check for the actual maximum and minimum values of the image or of the R, G, or B channels.

A note on intensity conventions in color images

We note that this I_pepper image spans the range $[5,255]$ in R, $[1,255]$ in G, and $[0,255]$ in B. We also note that when we didn't specify a color channel, python returned the max and min across the three color channels.

Extending the interpretation of a single channel image in which darker pixels have smaller intensity values and lighter pixels have larger intensity values, a color is defined as the contribution of R, G, and B, where larger intensities in those channels correspond to larger contribution from those colors.

2.2.3 Printing a portion of the image

Since we have three color channels in this color image, we print out each of the color channels separately.

Your turn:

What does this printout tell us about the structure in that part of the image? It can be a bit harder to interpret this sort of printout for a color image since we must keep track of multiple color channels simultaneously. There are other color spaces in which color interpretation are easier (e.g., HSV), but that is outside the scope of this tutorial.

There appear to be two basic regions of different characteristics: one in the upper left triangle of the window and one in the lower right. This is most obvious in the R and G channels, where we see a transition from small values in the upper left transitioning to larger values in the lower right. We also see a smaller effect in the B channel transitioning from larger values in the upper left to smaller values in the lower right.

In the upper left triangle, it appears that the image is a dark grayish purple since:

In the lower right triangle, it appears that the image is greenish since:

Your turn:

Visualize where in the image we are looking by overlaying a box on the image visualization.

We find that our conclusions regarding the appearance of the image in the window are validated: we have a dark purple region in the upper left corner, transitioning to the green of the pepper in the lower right.

Section 3: Transforming Images

We will find that many deep learning methods are very particular about the size of input images. This particularity about size extends across all three dimensions--the two spatial dimensions and the color dimension. As such, it is useful to learn a couple of common methods to rescale images in all three dimensions. Here, we will learn how to convert between RGB and grayscale, how to crop images, how to resize images.

3.1 Color to Grayscale

We can convert a color image to a grayscale image using a standard command included in Scikit-Image. We can use the skimage.color.rgb2gray function to convert the RGB image I_pepper to a grayscale image. The skimage.color.rgb2gray function applies a weighted averaging of the three color channels to yield a grayscale image. As a note, there is no single accepted weighting to convert between a color and grayscale image, so your results using skimage may differ from results using other libraries or programming languages.

Your turn:

What are the dimensions of I_pepper_gray? How many channels does it have? What is the variable type? What are the max and min values?

The image I_pepper_gray is $385\times512$ pixels (the same spatial dimensions as I_pepper) and has one color channel. It is of variable type float64.

A note about float-valued images

You will probably have noticed that the variable I_pepper_gray is now a float-valued array, and that the range is now within $[0,1]$. This is another common range for images. Some functions, e.g., functions that write out to standard image formats, may expect uint8 variables. You can always cast back to uint8 as needed, e.g., I_pepper_gray_uint8=(I_pepper_gray*255).astype('uint8').

A common issue in image processing is a mismatch between the expected and actual variable type and/or intensity range. If a function is expecting a float in the range $[0,1]$ and gets instead a uint8 in the range $[0,255]$, unexpected things can happen. A non-exhaustive list of some of the issues you might encounter:

Your turn:

Display this new grayscale image I_pepper_gray.

3.2 Grayscale to Color

We can similarly convert a grayscale image to a color image using a standard command included in Scikit-Image. It is important to note that this conversion is really just creation of an image with a third dimension. Each of the color channels will be identical since we cannot infer color from solely a grayscale image.

Your turn:

What are the dimensions of I_camera_rgb? How many channels does it have? What is the variable type? What are the max and min values?

The image I_camera_rgb is $256\times256$ pixels (the same spatial dimensions as I_camera) and has three color channel. It is of variable type uint8.

Your turn:

We expect that the three color channels in this I_camera_rgb image are identical. Print out a small portion of the image to verify this to yourself.

Your turn:

Display this new RGB image I_camera_rgb.

A note about why we might convert a grayscale image to a "color" image

We note, unsurprisingly, that the I_camera_rgb still appears as a grayscale image. It just happens to have 3 identical color channels. In the meantime, we may be using three times the space to represent this image, but the fact that it now has 3 color channels instead of 1 will be key when we begin studying deep learning networks.

3.3 Cropping

Suppose that we have a network that expects a $256\times256$ image as input, i.e., the dimensionality of the cameraman.png image. If we want to input peppers.png we have two problems: it has three color channels and it is of spatial dimension $384\times512$. We know that we can convert the RGB image to a grayscale image. Now we have to figure out how to rescale the spatial dimensions

If we crop the image, we choose some $256\times256$ pixels to retain. For example if we kept the upper left corner of the image, we would have an image such as follows.

Cropping removes parts of the image

We note, unsurprisingly, that we have completely removed parts of the pepper image.

3.4 Resizing

What if the peppers.png image had fewer than 256 pixels? What if we are unhappy with the loss of information associated with cropping? Here we can use an image interpolation from the Scikit-Image transform library. We can use the skimage.transform.resize function to resize the image. In the following syntax, we are asking the function to resize I_pepper_gray to a size $256\times256$ pixels.

We note that there are many options to the resize command, including specification of what form of interpolation to use, whether to anti-alias filter, and different means of specifying the scale of the output. See help(skimage.transform.resize) for more information. The syntax used here assumes defaults for all parameters (a good starting point) and provides the expected scale of the output image in an easy to understand tuple that consists of the spatial dimensions in pixels.

Resizing can distort the aspect ratio

Here we note that we have distorted the aspect ratio of the original peppers.png image. In some applications this may not matter and in others it might matter a great deal. In general, depending on the application, you may want to consider a combination of resizing and cropping.

3.5 Combining Cropping and Resizing

Your turn:

Combine cropping and resizing to yield a $256\times256$ pixel grayscale peppers image that you think retains the majority of the original "intent" of the image. Note--there is no "right" answer here...

Your turn:

How would you reconfigure the cameraman image to be the $384\times512\times3$ size of peppers? Would you find this an easier conversion to make or a more difficult one? Note--there is no "right" answer here either...

Section 4: Filtering Images

We will find that a key element of convolutional neural networks (CNNs) is a convolutional layer. It is thus critical that we understand the basics of image convolution and how to interpret those results.

Convolution is the means to filter an image in the spatial domain. This requires the definition of a filter kernel. The filter kernel is a 2D or 3D array of filter coefficients, generally much smaller in spatial extent than the image.

4.1 Low Pass (Smoothing) Filters

Many commonly used image filters are defined in scipy.ndimage. Here, we explore how to explicity define a filter kernel and convolve that kernel with an image. This will prepare us better to interpret the convolutional layers in CNNs. We will use the ndimage.filters.convolve function here.

4.1.1 Define the filter kernels

We define two filters h1 and h2. These are very simple lowpass (smoothing) filters where all the coefficients are equal in value and are normalized such that their sum is 1. It is generally common practice to use odd-sized filters. This is because there is an ambiguity in determining the "center" of an even-sized filter.

4.1.2 Convolving the filter kernels with an image

We compute the filtered output by convolving the image I_camera with each of the filter kernels using ndimage.filters.convolve. We then visualize the filtered images. We cast the image I_camera as a float to avoid integer arithmetic in the convolution operations.

Your turn:

What effect has each of the filters h1 and h2 had on the image?

Both filters have blurred the image, with h2 having a more pronounced effect (larger blurring) than h1.

4.2 High Pass (Edge Enhancing) Filters

4.2.1 Define the filter kernels

We define two filters h3 and h4. These are very simple highpass (edge enhancing) filters called the Sobel filters.

4.2.2 Convolving the filter kernels with an image

We compute the filtered output by convolving the image I_camera with each of the filter kernels. We again cast the image I_camera as a float to avoid integer arithmetic in the convolution operations.

A note on filtered images that have negative values

It is common that filtered images may end up with intensity values outside of the original range. In this case, the image I_camera was in the range $[0,255]$. If we look at the range of the filtered images, we find that the filtered images now span a much larger range:

The Sobel filters are designed to approximate the first derivative of the image. As such, we might expect that the derivative (think slope) will potentially be positive or negative and could span a different absolute range than the original $[0,255]$. We can get a better sense of the edge enhancement capabilities of h3 and h4 if we look only at the positive values. Looking only at the positive values rather than the absolute value will be more consistent with the activation function we will use in convolutional neural networks. We first clip all negative values in the images to zero and then visualize the filtered output.

When we focus only on the positive values of the filtered output, we see that the majority of the filtered image is now close to a value of 0 (i.e., black), and it is only at the edges of the image objects that we see a response (i.e., lighter values). We see that h3 has enhanced edges oriented in a horizontal direction and h4 has enhanced edges oriented in a vertical direction.