# Tutorial 5: Advanced DL Networks
## Laura E. Boucheron, Electrical & Computer Engineering, NMSU
### May 2021
Copyright (C) 2021  Laura E. Boucheron

This information is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this work; if not, If not, see <https://www.gnu.org/licenses/>.

## Overview
In this tutorial, we study some more deep learning architectures for both image analysis and time series analysis.

This tutorial contains 4 sections:
  - **Section 0: Preliminaries**: some notes on using this notebook, how to download the image dataset that we will use for this tutorial, and import commands for the libraries necessary for this tutorial
  - **Section 1: Activation Maximization**: A method to generate images that maximally activate a selected neuron
  - **Section 2: YOLO-v3 for Object Detection**: A network to detect objects in images 
  - **Section 3: Style Transfer**: A network to change the "style" of an image
  - **Section 4: Generative Adversarial Networks (GANs)**: A network that can generate synthetic images similar to a given dataset
  
There are a few subsections with the heading "**<span style='color:Green'> Your turn: </span>**" throughout this tutorial in which you will be asked to apply what you have learned.  

# Section 0: Preliminaries 
## A Note on Jupyter Notebooks

There are two main types of cells in this notebook: code and markdown (text).  You can add a new cell with the plus sign in the menu bar above and you can change the type of cell with the dropdown menu in the menu bar above.  As you complete this tutorial, you may wish to add additional code cells to try out your own code and markdown cells to add your own comments or notes. 

Markdown cells can be augmented with a number of text formatting features, including
  - bulleted
  - lists

embedded $\LaTeX$, monotype specification of `code syntax`, **bold font**, and *italic font*.  There are many other features of markdown cells--see the jupyter documentation for more information.

You can edit a cell by double clicking on it.  If you double click on this cell, you can see how to implement the various formatting referenced above.  Code cells can be run and markdown cells can be formatted using Shift+Enter or by selecting the Run button in the toolbar above.

Once you have completed (all or part) of this notebook, you can share your results with colleagues by sending them the `.ipynb` file.  Your colleagues can then open the file and will see your markdown and code cells as well as any results that were printed or displayed at the time you saved the notebook.  If you prefer to send a notebook without results displayed (like this notebook appeared when you downloaded it), you can select ("Restart & Clear Output") from the Kernel menu above.  You can also export this notebook in a non-executable form, e.g., `.pdf` through the File, Save As menu.

## Section 0.1 Downloading Necessary Data

In Section 1, we will need:
 - The `tf-keras-vis` package.  Follow the installation instructions at https://pypi.org/project/tf-keras-vis/.  

In Section 2, we will need:
 - The YOLO-v3 weights available for download from: https://pjreddie.com/media/files/yolov3.weights (273 MB).  Save this data file to your working directory.
 - The `zebra.jpg` image available from from https://machinelearningmastery.com/wp-content/uploads/2019/03/zebra.jpg.  Save this file to your working directory.
 
In Section 3, we will need:
 - The image of Van Gogh's Starry Night painting, available at https://i.imgur.com/9ooB60I.jpg.  Save this file to filename `starry_night.jpg` in your working directory.
 
In Section 4, we will need:
 - Nothing beyond what we have already encountered in previous tutorials.

## Section 0.1a Import Necessary Libraries (For users using a local machine)
Here, at the top of the code, we import all the libraries necessary for this tutorial.  We will introduce the functionality of any new libraries throughout the tutorial, but include all import statements here as standard coding practice.  We include a brief comment after each library here to indicate its main purpose within this tutorial.

It would be best to run this next cell before the workshop starts to make sure you have all the necessary packages installed on your machine

In [None]:
# Basic imports
import numpy as np # mathematical and scientific functions
import matplotlib.pyplot as plt # visualization
from matplotlib.patches import Rectangle # plot rectangles
# format matplotlib options
%matplotlib inline
plt.rcParams.update({'font.size': 16})

# Imports for Section 1
import tensorflow as tf
import tf_keras_vis.activation_maximization
import tf_keras_vis.utils.callbacks
from matplotlib import cm
from tf_keras_vis.gradcam import Gradcam
from tf_keras_vis.utils.scores import CategoricalScore
from tensorflow.keras.applications.vgg16 import preprocess_input

# Imports for Section 2 (some imports from Section 1 may also be required if just running Section 2)
# import necessary keras layers; see help files for more information on each layer type
from keras.layers import Conv2D # convolutional layer
from keras.layers import Input # input layer
from keras.layers import BatchNormalization # batchnorm layer to standardize batches and help optimization
from keras.layers import LeakyReLU # activation similar to ReLU
from keras.layers import ZeroPadding2D # zero pads 2D inputs (images)
from keras.layers import UpSampling2D # upsamples 2D inputs by replicating rows and columns of data
from keras.layers import Dense # fully connected layer
# import layer manipulation functions
from keras.layers.merge import add, concatenate # functions to add or concatenate tensors
# import models and model related functions; 
from keras.models import Model # a generic keras model class used to modify architectures
from keras.models import Sequential # the basic deep learning model
from keras.models import load_model # to load a pre-saved model (may require hdf libraries installed)
# import functions to input and preprocess images
from keras.preprocessing.image import load_img # keras method to read in images 
from keras.preprocessing.image import img_to_array # keras method to convert images to numpy array

# Imports for Section 3 (some imports from Sections 1 and two may also be required if running just Section 3)
from tensorflow import keras
import imageio
from tensorflow.keras.applications import vgg16

# Imports for Section 4 (some imports from Sections 1-3 may also be required if running just Section 4)
from keras.datasets.mnist import load_data
from keras.optimizers import Adam
from keras.layers import Reshape
from keras.layers import Flatten
from keras.layers import Conv2DTranspose
from keras.layers import Dropout

## Section 0.1b Build the Conda Environment (For users using the ARS HPC Ceres with JupyterLab)
Please follow instructions at https://kerriegeil.github.io/NMSU-USDA-ARS-AI-Workshops/setup/#on-the-ceres-hpc

You will want to do this BEFORE the workshop starts.

# Section 1 Activation Maximization
In this part, we will explore visualizations of images that maximally activate a given neuron.  If we use gradient ascent to generate an image that maximally activates a dense layer neuron, we will generate an image that is most class-like according to the network, e.g., the most flamingo-like image.  If we generate an image that maximally activates a convolutional layer neuron, we will generate an image that is most feature-like according to the network, allowing us to explore what features each of the convolutional layer filters are cueing on.  
 - We will use the `tf-keras-vis` package.  Follow the installation instructions at https://pypi.org/project/tf-keras-vis/.  

The examples in this section were adapted from the `tf-keras-vis` documentation at https://pypi.org/project/tf-keras-vis/.  The `tf-keras-vis` package is released under the MIT license:

Copyright 2019 k.keisen@gmail.com

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

## Section 1.1 Maximal Activation for a Dense Layer Neuron
In this part, we visualize the maximal activation for a dense layer neuron.  Since the last dense layer correspond to each of the 1000 ImageNet classes, this will visualize a class as “understood” by the VGG16 network.
 - We need to load the VGG16 network as a `tensorflow` `Model`.  This requires slightly different syntax that used in our exploration of the VGG16 network in Tutorial 4:
   ```
   model1=tf.keras.applications.vgg16.VGG16(include_top=True,weights='imagenet')
   ```
 - Define a model modifier that changes the activation of the layer to be visualized to be linear:
   ```
   def model_modifier(m):
       m.layers[-1].activation=tf.keras.activations.linear
   ```
 - Define an activation maximization instance. 
   ```
   model1_am=tf_keras_vis.activation_maximization.ActivationMaximization(model1,model_modifier,clone=True)
   ```
   Note—if you use `clone=False`, you will use fewer machine resources but you will modify `model1` and will need to reload for the next step.  If you use `clone=True`, you will create a copy of the model instance which will leave `model1` unchanged in memory and use more machine resources.  
 - Define a loss function associated with an arbitrary category number.  Here, we use category number 130 which corresponds to the class `'flamingo'`.
   ```
   def loss(output):
       return output[:,130]
   ```
 - Now we can compute the maximum activation image corresponding to category 130:
   ```
   max_act=model1_am(loss,callbacks=[tf_keras_vis.utils.callbacks.Print(interval=50)])
   ```
   The `callbacks` option prints out various metrics every 50 iterations of the gradient ascent. This option is not necessary but can provide some feedback regarding whether the code is running successfully.  The variable `max_act` will be a classes$\times$rows$\times$columns$\times$channels `ndarray`.  Here, classes=1.

The code below will visualize the maximal activation `max_act` for the flamingo class.  Does this image look like a flamingo?  What different aspects of a flamingo can you see in the image?  

In [None]:
def model_modifier(m):
    m.layers[-1].activation = tf.keras.activations.linear

def loss(output):
    return output[:,130]
    
model1 = tf.keras.applications.vgg16.VGG16(include_top=True,weights='imagenet')
model1_am = tf_keras_vis.activation_maximization.ActivationMaximization(model1,
                                                 model_modifier,
                                                 clone=False)
max_act = model1_am(loss,steps=512)

plt.figure(figsize=(5,5))
plt.imshow(max_act.squeeze().astype(np.uint8))
plt.axis('off')
plt.title('flamingo')
plt.show()

## **<span style='color:Green'> Your turn: </span>**
Choose another class from ImageNet (see https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a for a handy and not easy to find comprehensive list of ImageNet classes) and visualize the maximal activation.

In [None]:
def model_modifier(m):
    m.layers[-1].activation = tf.keras.activations.linear

def loss(output):
    return output[:,550]

max_act = model1_am(loss,steps=512)

plt.figure(figsize=(5,5))
plt.imshow(max_act.squeeze().astype(np.uint8))
plt.axis('off')
plt.title('esspresso maker')
plt.show()

## Section 1.2 Maximal Activation for a Convolutional Neuron
In this part, we visualize the maximal activation for a convolutional layer neuron.  Since the convolutional layers are understood to be feature extractors, this will visualize an image that embodies the feature extracted by that neuron in the VGG16 network.
 - If needed (e.g., you used `clone=False`), load the VGG16 network into variable `model1` using the same syntax as in part (g-i).
 - Define a model modifier that outputs the activation at the desired layer and changes the activation of that layer to be linear:
   ```
   def model_modifier(current_model):
       # using layer name
       target_layer=current_model.get_layer(name=layer_name)
       # using layer number
       #target_layer=current_model.get_layer(index=layer_idx)
       new_model=tf.keras.Model(inputs=current_model.inputs,\
                                outputs=target_layer.output)
       new_model.layers[-1].activation=tf.keras.activations.linear
   ```
   Note that there are two options here to specify the target layer by name or by index; be sure to comment out the undesired option.  Note also that this function assuming that you have defined a variable outside of the scope of the function named `layer_name` or `layer_idx`.  For this part, use the last convolutional layer (`layer_name='block1_conv3'`, `layer_idx=17`).
 - Define an activation maximization instance using the same syntax as in part (g-i).
 - Define a loss function associated with an arbitrary filter number `filter_num`:
   ```
   def loss(output):
       return output[...,filter_num]
   ```
   Note that the variable `filter_num` is assumed to be defined outside of the function.  For this part, use filter number 42.
 - Now we can compute the maximum activation image corresponding to filter number 42 using the same syntax as in part (g-i).
      
The following code will visualize the maximal activation max_act for filter number 42 of the last convolutional layer.  What feature(s) does this filter seem to cue on?  

In [None]:
def model_modifier(current_model):
    # using layer name
    target_layer=current_model.get_layer(name=layer_name)
    # using layer number
    #target_layer=current_model.get_layer(index=layer_idx)
    new_model=tf.keras.Model(inputs=current_model.inputs,outputs=target_layer.output)
    new_model.layers[-1].activation=tf.keras.activations.linear
    return new_model

def loss(output):
    return output[...,filter_num]

model1 = tf.keras.applications.vgg16.VGG16(include_top=True,weights='imagenet')

layer_name = 'block5_conv3'
model1_am = tf_keras_vis.activation_maximization.ActivationMaximization(model1,
                                                 model_modifier,
                                                 clone=False)

filter_num = 42
max_act = model1_am(loss,steps=512)

plt.figure(figsize=(5,5))
plt.imshow(max_act.squeeze().astype(np.uint8))
plt.axis('off')
plt.title('filter'+str(filter_num))
plt.show()

## **<span style='color:Green'> Your turn: </span>**
Choose another filter from the same layer and visualize the maximal activation.

In [None]:
def model_modifier(current_model):
    # using layer name
    target_layer=current_model.get_layer(name=layer_name)
    # using layer number
    #target_layer=current_model.get_layer(index=layer_idx)
    new_model=tf.keras.Model(inputs=current_model.inputs,outputs=target_layer.output)
    new_model.layers[-1].activation=tf.keras.activations.linear
    return new_model

def loss(output):
    return output[...,filter_num]

model1 = tf.keras.applications.vgg16.VGG16(include_top=True,weights='imagenet')

layer_name = 'block5_conv3'
model1_am = tf_keras_vis.activation_maximization.ActivationMaximization(model1,
                                                 model_modifier,
                                                 clone=False)

filter_num = 128
max_act = model1_am(loss,steps=512)

plt.figure(figsize=(5,5))
plt.imshow(max_act.squeeze().astype(np.uint8))
plt.axis('off')
plt.title('filter'+str(filter_num))
plt.show()

## Section 1.3 GradCAM (Gradient Class Activation Map)
We can use the GradCAM method to give further insight into portions of the image that are most important for the classification.  Since we have the CalTech101 dataset downloaded from a previous tutorial, we can use those images as input to this visualization.  We wil have to choose those categories in ImageNet (https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a) that also exist in CalTech101.

In [None]:
def model_modifier(cloned_model):
    cloned_model.layers[-1].activation = tf.keras.activations.linear
    return cloned_model

score = CategoricalScore(130) # 130 is flamingo
category = 'flamingo'

# Load images and Convert them to a Numpy array
img1 = np.asarray(load_img('101_ObjectCategories/'+category+'/image_0001.jpg', target_size=(224, 224)))

# Preparing input data for VGG16
X = preprocess_input(img1)

# Show original image
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.imshow(img1)
plt.axis('off')
plt.title(category)

model1 = tf.keras.applications.vgg16.VGG16(include_top=True,weights='imagenet')

# Create Gradcam object
gradcam = Gradcam(model1,model_modifier=model_modifier,clone=False)

# Generate heatmap with GradCAM
cam = gradcam(score, X, penultimate_layer=-1)
heatmap = np.uint8(cm.jet(cam)[..., :3] * 255)

# Show results
plt.subplot(1,2,2)
plt.imshow(img1)
plt.imshow(heatmap.squeeze(),cmap='jet',alpha=0.5) # overlay heatmap
plt.axis('off')
plt.title(category)
plt.show()

## **<span style='color:Green'> Your turn: </span>**
Choose another class to visualize the GradCAM results.

In [None]:
def model_modifier(cloned_model):
    cloned_model.layers[-1].activation = tf.keras.activations.linear
    return cloned_model

score = CategoricalScore(949) # 949 is strawberry
category = 'strawberry'

# Load images and Convert them to a Numpy array
img1 = np.asarray(load_img('101_ObjectCategories/'+category+'/image_0001.jpg', target_size=(224, 224)))

# Preparing input data for VGG16
X = preprocess_input(img1)

# Show original image
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.imshow(img1)
plt.axis('off')
plt.title(category)

model1 = tf.keras.applications.vgg16.VGG16(include_top=True,weights='imagenet')

# Create Gradcam object
gradcam = Gradcam(model1,model_modifier=model_modifier,clone=False)

# Generate heatmap with GradCAM
cam = gradcam(score, X, penultimate_layer=-1)
heatmap = np.uint8(cm.jet(cam)[..., :3] * 255)

# Show results
plt.subplot(1,2,2)
plt.imshow(img1)
plt.imshow(heatmap.squeeze(),cmap='jet',alpha=0.5) # overlay heatmap
plt.axis('off')
plt.title(category)
plt.show()

# Section 2 YOLO-v3 for Object Detection

In this section, we will study the use of the YOLO-v3 (You Only Look Once version 3) network for object detection in images.  The papers describing the YOLO architectures can be found at:
 - YOLO-v1: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
 - YOLO-v2: http://openaccess.thecvf.com/content_cvpr_2017/papers/Redmon_YOLO9000_Better_Faster_CVPR_2017_paper.pdf
 - YOLO-v3: https://arxiv.org/pdf/1804.02767.pdf

The code in this section was taken and modified (slightly) from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/ and https://github.com/experiencor/keras-yolo3

The YOLO-v3 weights can be downloaded from: https://pjreddie.com/media/files/yolov3.weights

License information for the keras-yolo3 code at https://github.com/experiencor/keras-yolo3

MIT License

Copyright (c) 2017 Ngoc Anh Huynh

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

### Define YOLO-v3 architecture and functions to load YOLO-v3 weights
These function definitions contain code to define the YOLO-v3 architecture and also to load weights for the YOLO-v3 architecture from the file linked above (https://pjreddie.com/media/files/yolov3.weights).

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)

def _conv_block(inp, convs, skip=True):
    x = inp
    count = 0
    for conv in convs:
        if count == (len(convs) - 2) and skip:
            skip_connection = x
        count += 1
        if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top
        x = Conv2D(conv['filter'],\
                   conv['kernel'],\
                   strides=conv['stride'],\
                   padding='valid' if conv['stride'] > 1 else 'same',\
                   name='conv_' + str(conv['layer_idx']),\
                   use_bias=False if conv['bnorm'] else True)(x)
        if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)
        if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)
    return add([skip_connection, x]) if skip else x
 
def make_yolov3_model():
    input_image = Input(shape=(None, None, 3))
    # Layer  0 => 4
    x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},\
                                  {'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},\
                                  {'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},\
                                  {'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])
    # Layer  5 => 8
    x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},\
                        {'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},\
                        {'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])
    # Layer  9 => 11
    x = _conv_block(x, [{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},\
                        {'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])
    # Layer 12 => 15
    x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},\
                        {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},\
                        {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])
    # Layer 16 => 36
    for i in range(7):
        x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},\
                            {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
    skip_36 = x
    # Layer 37 => 40
    x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},\
                        {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},\
                        {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])
    # Layer 41 => 61
    for i in range(7):
        x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},\
                            {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
    skip_61 = x
    # Layer 62 => 65
    x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},\
                        {'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},\
                        {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])
    # Layer 66 => 74
    for i in range(3):
        x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},\
                            {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
    # Layer 75 => 79
    x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},\
                        {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},\
                        {'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},\
                        {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},\
                        {'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)
    # Layer 80 => 82
    yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 80},\
                              {'filter':  255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)
    # Layer 83 => 86
    x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)
    x = UpSampling2D(2)(x)
    x = concatenate([x, skip_61])
    # Layer 87 => 91
    x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},\
                        {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},\
                        {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},\
                        {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},\
                        {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)
    # Layer 92 => 94
    yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 92},\
                              {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)
    # Layer 95 => 98
    x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True,   'layer_idx': 96}], skip=False)
    x = UpSampling2D(2)(x)
    x = concatenate([x, skip_36])
    # Layer 99 => 106
    yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 99},\
                               {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 100},\
                               {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 101},\
                               {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 102},\
                               {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 103},\
                               {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 104},\
                               {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)
    model = Model(input_image, [yolo_82, yolo_94, yolo_106])
    return model
 
class WeightReader:
    def __init__(self, weight_file):
        with open(weight_file, 'rb') as w_f:
            major,= struct.unpack('i', w_f.read(4))
            minor,= struct.unpack('i', w_f.read(4))
            revision, = struct.unpack('i', w_f.read(4))
            if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
                w_f.read(8)
            else:
                w_f.read(4)
            transpose = (major > 1000) or (minor > 1000)
            binary = w_f.read()
        self.offset = 0
        self.all_weights = np.frombuffer(binary, dtype='float32')
 
    def read_bytes(self, size):
        self.offset = self.offset + size
        return self.all_weights[self.offset-size:self.offset]
 
    def load_weights(self, model):
        for i in range(106):
            try:
                conv_layer = model.get_layer('conv_' + str(i))
                print("loading weights of convolution #" + str(i))
                if i not in [81, 93, 105]:
                    norm_layer = model.get_layer('bnorm_' + str(i))
                    size = np.prod(norm_layer.get_weights()[0].shape)
                    beta  = self.read_bytes(size) # bias
                    gamma = self.read_bytes(size) # scale
                    mean  = self.read_bytes(size) # mean
                    var   = self.read_bytes(size) # variance
                    weights = norm_layer.set_weights([gamma, beta, mean, var])
                if len(conv_layer.get_weights()) > 1:
                    bias   = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
                    kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
                    kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
                    kernel = kernel.transpose([2,3,1,0])
                    conv_layer.set_weights([kernel, bias])
                else:
                    kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
                    kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
                    kernel = kernel.transpose([2,3,1,0])
                    conv_layer.set_weights([kernel])
            except ValueError:
                print("no convolution #" + str(i))
 
    def reset(self):
        self.offset = 0

### Create an instantiation of the YOLO-v3 architecture with weights
The following code instantiates a YOLO-v3 model and loads the weights pre-trained on the MSCOCO dataset (https://cocodataset.org/)

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)

# create a YOLOv3 Keras model and save it to file
# define the model
model_yolo3 = make_yolov3_model()
# load the model weights
weight_reader = WeightReader('yolov3.weights')
# set the model weights into the model
weight_reader.load_weights(model_yolo3)
# compile the model
model_yolo3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# save the model to file for ease of use later
model_yolo3.save('yolov3.h5')

### What sort of structure does YOLO-v3 have?
We use the `summary` method to display some information about the structure of the YOLO-v3 architecture.  We notice that there are a lot more layers in YOLO-v3 than we saw even in VGG16 and that there are some new kinds of layers that we haven't encountered yet:
 - **Batch Normalization** - This is a means to smooth out the statistical variations that are encountered from batch to batch in the learning process
 - **LeakyReLU** - Instead of pegging all negative values to 0 which can cause gradient issues when optimizing, negative values are pegged to some small value.  Thus, negative values are "leaking" through the activation.
 - **ZeroPadding2D** - Pads around the edge(s) of the image with zeros.
 - **Add** - Adds the output tensors from two layers.
 - **Concatenate** - Concatenates two tensors.
 - **UpSampling2D** - Increases the spatial dimensionality of the activations.
 
Many of these additional layers are necessary for the "skip connections" in the YOLO architecture.  For one illustration of the skip connection and the concatenation afterwards, see Figure 4 in 
L. Varela, L. E. Boucheron, N. Malone, and N. Spurlock, “Streak detection in wide field of view images using Convolutional Neural Networks (CNNs),” In proceedings: The Advanced Maui Optical and Space Surveillance Technologies Conference (AMOS), 2019. available: https://amostech.com/TechnicalPapers/2019/Machine-Learning-for-SSA-Applications/Varela.pdf

In [None]:
model_yolo3.summary()

### Code to preprocess an image for input to YOLO-v3
The following function defines a method to preprocess an image into the size expected by the YOLO-v3 network.  It also normalizes the intensities of the image.

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = np.expand_dims(image, 0)
    return image, width, height

### Load an image and detect objects
The code below loads in an example image `zebra.jpg` available from https://machinelearningmastery.com/wp-content/uploads/2019/03/zebra.jpg, preprocess it using the `load_image_pixels` function and sends that image through the YOLO-v3 network.

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)

# load yolov3 model and perform object detection

# load yolov3 model
model_yolo3 = load_model('yolov3.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model_yolo3.predict(image)
# summarize the shape of the list of arrays
print('The output from YOLO-v3 is a list of arrays of the following shapes')
print([a.shape for a in yhat])

plt.figure()
plt.imshow(image.squeeze()) # we actually do need the squeeze here...
plt.show()

When we plot the processed image, we note that the `load_image_pixels` function appears to simply reshape the image to the desired dimensions without condideration for the aspect ratio of the image.

We also notice that the prediction from YOLO-v3 is a list of arrays.  We need to somehow interpret those arrays in order to understand what has been predicted.  The following functions provide the means to interpret those arrays in terms of what objects have been detected and where in the image they have been detected.

### Functions to decode YOLO-v3 output and plot object detection results

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)
 
class BoundBox:
    def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax
        self.objness = objness
        self.classes = classes
        self.label = -1
        self.score = -1
 
    def get_label(self):
        if self.label == -1:
            self.label = np.argmax(self.classes)
 
        return self.label
 
    def get_score(self):
        if self.score == -1:
            self.score = self.classes[self.get_label()]
 
        return self.score
 
def _sigmoid(x):
    return 1. / (1. + np.exp(-x))
 
def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    grid_h, grid_w = netout.shape[:2]
    nb_box = 3
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))
    nb_class = netout.shape[-1] - 5
    boxes = []
    netout[..., :2]  = _sigmoid(netout[..., :2])
    netout[..., 4:]  = _sigmoid(netout[..., 4:])
    netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
    netout[..., 5:] *= netout[..., 5:] > obj_thresh
 
    for i in range(grid_h*grid_w):
        row = i / grid_w
        col = i % grid_w
        for b in range(nb_box):
            # 4th element is objectness score
            objectness = netout[int(row)][int(col)][b][4]
            if(objectness.all() <= obj_thresh): continue
            # first 4 elements are x, y, w, and h
            x, y, w, h = netout[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w # center position, unit: image width
            y = (row + y) / grid_h # center position, unit: image height
            w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
            h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
            # last elements are class probabilities
            classes = netout[int(row)][col][b][5:]
            box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
            boxes.append(box)
    return boxes
 
def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
    new_w, new_h = net_w, net_h
    for i in range(len(boxes)):
        x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
        y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
        boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
        boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
        boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
        boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)
        
def _interval_overlap(interval_a, interval_b):
    x1, x2 = interval_a
    x3, x4 = interval_b
    if x3 < x1:
        if x4 < x1:
            return 0
        else:
            return min(x2,x4) - x1
    else:
        if x2 < x3:
            return 0
        else:
            return min(x2,x4) - x3

def bbox_iou(box1, box2):
    intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
    intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
    intersect = intersect_w * intersect_h
    w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
    w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
    union = w1*h1 + w2*h2 - intersect
    return float(intersect) / union
 
def do_nms(boxes, nms_thresh):
    if len(boxes) > 0:
        nb_class = len(boxes[0].classes)
    else:
        return
    for c in range(nb_class):
        sorted_indices = np.argsort([-box.classes[c] for box in boxes])
        for i in range(len(sorted_indices)):
            index_i = sorted_indices[i]
            if boxes[index_i].classes[c] == 0: continue
            for j in range(i+1, len(sorted_indices)):
                index_j = sorted_indices[j]
                if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
                    boxes[index_j].classes[c] = 0

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = np.expand_dims(image, 0)
    return image, width, height
 
# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
    v_boxes, v_labels, v_scores = list(), list(), list()
    # enumerate all boxes
    for box in boxes:
        # enumerate all possible labels
        for i in range(len(labels)):
            # check if the threshold for this label is high enough
            if box.classes[i] > thresh:
                v_boxes.append(box)
                v_labels.append(labels[i])
                v_scores.append(box.classes[i]*100)
                # don't break, many labels may trigger for one box
    return v_boxes, v_labels, v_scores
 
# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
    # load the image
    data = plt.imread(filename)
    # plot the image
    plt.imshow(data)
    # get the context for drawing boxes
    ax = plt.gca()
    # plot each box
    for i in range(len(v_boxes)):
        box = v_boxes[i]
        # get coordinates
        y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
        # calculate width and height of the box
        width, height = x2 - x1, y2 - y1
        # create the shape
        rect = Rectangle((x1, y1), width, height, fill=False, color='white')
        # draw the box
        ax.add_patch(rect)
        # draw text and score in top left corner
        label = "%s (%.3f)" % (v_labels[i], v_scores[i])
        plt.text(x1, y1, label, color='white')
    # show the plot
    plt.show()

###  Display detection results for example image
The anchors defined in the code below are associated with the anchor boxes used in the YOLO networks.  These anchor boxes essentially define expected aspect ratios for objects in images and are used to "anchor" the detected bounding boxes.  The anchor boxes defined here are selected as good for the MSCOCO dataset.

There is also a `class_threshold` specified below.  This threshold changes the tolerance for accepting an object detection.  If we make this smaller, objects with less confidence will be included in the prediction.

There is additionally a threshold (the second parameters in the `do_nms` function call) for the non-maxima suppression of the boxes.  If we increase that value, we will find more overlapping (and potentially conflicting) bounding boxes in the prediction.

Finally, there are the list of 80 objects from the MSCOCO dataset which will be used to annotate the object detections on the image.

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)

# load yolov3 model and perform object detection

# load yolov3 model
model_yolo3 = load_model('yolov3.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model_yolo3.predict(image)

# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
    # decode the output of the network
    boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxes
do_nms(boxes, 0.5)
# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
    "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
    "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
    "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
    "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
    "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
    "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
    "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# summarize what we found
for i in range(len(v_boxes)):
    print(v_labels[i], v_scores[i])
# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

We see that the network correctly predicted the presence of and location of the three zebras in the image.

## **<span style='color:Green'> Your turn: </span>**
Modify the code above to see how the network behaves on different images, or with different parameter choices, especially for the `class_threshold` or the `do_nms` threshold.  For your convenience, the code from the cell above has been copied down below for you to modify.  The `photo_filename` has been defined here to use the `peppers.png` image from Tutorial 1.  The method will not correctly identify items in the `peppers.png` image since it has not been taught to recognize any of the objects within the peppers image.  It may not correctly identify objects in other images that you show it.  Remember that the only objects that this network knows, because of the objects that it has been trained on the MSCOCO dataset, are the 80 objects defined in the `labels` variable.

In [None]:
# the following code adapted from https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
# which based the code on https://github.com/experiencor/keras-yolo3 (see MIT License statement above)

# load yolov3 model and perform object detection

# load yolov3 model
model_yolo3 = load_model('yolov3.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'peppers.png'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model_yolo3.predict(image)

# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
# define the probability threshold for detected objects
class_threshold = 0.1
boxes = list()
for i in range(len(yhat)):
    # decode the output of the network
    boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxes
do_nms(boxes, 0.9)
# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
    "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
    "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
    "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
    "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
    "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
    "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
    "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# summarize what we found
for i in range(len(v_boxes)):
    print(v_labels[i], v_scores[i])
# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

# Section 3: Style Transfer
This example adapted from https://keras.io/examples/generative/neural_style_transfer/.  

In style transfer, the "style" of a given image (the style reference image as defined in the code below) is learned and can be applied to another image (the base image as defined in the code below).  The content of the base image is retained, but is rendered according to the style of the style reference image.  This network will deal with three losses: a style loss (penalizing outputs that are too far in appearance from the characteristics of the style reference image), a content loss (penalizing outputs that are too far in appearance from the content of the base image), and a spatial continuity loss (penalizing outputs that do not maintain the local coherence of the base image).  

## Specify Images
Here, we specify the style image (Van Gogh's Starry Night painting) and the base image as the peppers image from Tutorial 1.  

In [None]:
base_image_path = 'peppers.png'
base_image = np.asarray(imageio.imread('peppers.png'))
style_reference_image_path = 'starry_night.jpg'
style_image = np.asarray(imageio.imread('starry_night.jpg'))

plt.figure(figsize=(20,20))
plt.subplot(1,2,1)
plt.imshow(base_image)
plt.axis('off')
plt.title('Base image')
plt.subplot(1,2,2)
plt.imshow(style_image)
plt.axis('off')
plt.title('Style image')
plt.show()

## Define Functions
The following function definitions will allow us to preprocess the input image, deprocess the output image, and define and compute the three loss functions.

In [None]:
def preprocess_image(image_path):
    # Util function to open, resize and format pictures into appropriate tensors
    img = keras.preprocessing.image.load_img(
        image_path, target_size=(img_nrows, img_ncols)
    )
    img = keras.preprocessing.image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg16.preprocess_input(img)
    return tf.convert_to_tensor(img)

def deprocess_image(x):
    # Util function to convert a tensor into a valid image
    x = x.reshape((img_nrows, img_ncols, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype("uint8")
    return x

# The gram matrix of an image tensor (feature-wise outer product)

def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram

# The "style loss" is designed to maintain
# the style of the reference image in the generated image.
# It is based on the gram matrices (which capture style) of
# feature maps from the style reference image
# and from the generated image

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_nrows * img_ncols
    return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

# An auxiliary loss function
# designed to maintain the "content" of the
# base image in the generated image

def content_loss(base, combination):
    return tf.reduce_sum(tf.square(combination - base))

# The 3rd loss function, total variation loss,
# designed to keep the generated image locally coherent

def total_variation_loss(x):
    a = tf.square(
        x[:, : img_nrows - 1, : img_ncols - 1, :] - x[:, 1:, : img_ncols - 1, :]
    )
    b = tf.square(
        x[:, : img_nrows - 1, : img_ncols - 1, :] - x[:, : img_nrows - 1, 1:, :]
    )
    return tf.reduce_sum(tf.pow(a + b, 1.25))

def compute_loss(combination_image, base_image, style_reference_image):
    input_tensor = tf.concat(
        [base_image, style_reference_image, combination_image], axis=0
    )
    features = feature_extractor(input_tensor)

    # Initialize the loss
    loss = tf.zeros(shape=())

    # Add content loss
    layer_features = features[content_layer_name]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(
        base_image_features, combination_features
    )
    # Add style loss
    for layer_name in style_layer_names:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = style_loss(style_reference_features, combination_features)
        loss += (style_weight / len(style_layer_names)) * sl

    # Add total variation loss
    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss

@tf.function
def compute_loss_and_grads(combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = compute_loss(combination_image, base_image, style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads

## Use VGG Model as Base
We will use the VGG16 model pre-trained on ImageNet as the base network here.  There are some choices here about which portions of the network to use to compute the style and content losses, the size of the generated output image, the weights for each of the losses, and how many iterations to run the optimization for.

In [None]:
# Build a VGG16 model loaded with pre-trained ImageNet weights
model1 = vgg16.VGG16(weights="imagenet", include_top=False)

# Get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model1.layers])

# Set up a model that returns the activation values for every layer in
# VGG16 (as a dict).
feature_extractor = keras.Model(inputs=model1.inputs, outputs=outputs_dict)

# List of layers to use for the style loss.
style_layer_names = [
    "block1_conv1",
    "block2_conv1",
    "block3_conv1",
    "block4_conv1",
    "block5_conv1",
]
# The layer to use for the content loss.
content_layer_name = "block5_conv2"

optimizer = keras.optimizers.SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
    )
)

# Dimensions of the generated picture.
width, height = keras.preprocessing.image.load_img(base_image_path).size
img_nrows = 400
img_ncols = int(width * img_nrows / height)

base_image = preprocess_image(base_image_path)
style_reference_image = preprocess_image(style_reference_image_path)
combination_image = tf.Variable(preprocess_image(base_image_path))

# Weights of the different loss components
total_variation_weight = 1e-6 # maintains spatial coherence of the base image
style_weight = 1e-6 # maintains the style of the style reference image
content_weight = 2.5e-8 # maintains the content of the base image

iterations = 4000
for i in range(1, iterations + 1):
    loss, grads = compute_loss_and_grads(
        combination_image, base_image, style_reference_image
    )
    optimizer.apply_gradients([(grads, combination_image)])
    if i % 100 == 0:
        print("Iteration %d: loss=%.2f" % (i, loss))
        img = deprocess_image(combination_image.numpy())
        fname = "Iteration_%d.png" % i
        keras.preprocessing.image.save_img(fname, img)

## Look at Output
Note that the code above saves the style transferred image after every 100 iterations.  Let's look at the output image as the network converges.

In [None]:
iteration = 4000
img_out = imageio.imread('Iteration_'+str(iteration)+'.png')
plt.figure(figsize=(10,10))
plt.imshow(img_out)
plt.axis('off')
plt.show()

## So... These Pictures are Cool and All, but Why Would I Care About Style Transfer in Research?
Style transfer is just one example of networks taht can operate on an image and output another image.  Another variety of such networks are called Fully Convolutional Neural Networks (FCNNs).  

FCNNs can be used for **image segmentation**, i.e., outputting a mask of pixels corresponding to different objects within an image. See https://arxiv.org/abs/1411.4038 for the seminal paper on FCNNs for image segmentation and https://arxiv.org/abs/1505.04597 for a related architecture called the UNet.

FCNNs have also recently used for **modality transfer**, e.g., to infer what a fluorescence microscopy image would look like from a transmitted light microscopy image.  See https://www.biorxiv.org/content/10.1101/289504v3.full for an example of modality transfer using a UNet type architecture.

# Section 4: Generative Adversarial Networks (GANs)
Generative adversarial networks (GANs) can be used to generate synthetic data that (at least according to the machine) are indistinguishable from real images.  GANs have two components: a discriminator network that classifies (discriminates) between synthetic and real images and a generator network that generates synthetic images.  These two networks compete with each other.  The generator wants to fool the discriminator and the discriminator wants to avoid being fooled.  The better one network becomes, the better the other must become.

For this example, we will train a GAN that generates synthetic images of digits similar to the MNIST data.  This example was slightly modified from the example at https://machinelearningmastery.com/how-to-develop-a-generative-adversarial-network-for-an-mnist-handwritten-digits-from-scratch-in-keras/.

## The Discriminator Model
Let's first define the discriminator.  Note that the discriminator model looks very similar to the MNIST network we worked with in Tutorials 3 and 4.

In [None]:
# define the standalone discriminator model
def define_discriminator(in_shape=(28,28,1)):
    model = Sequential()
    model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same', input_shape=in_shape))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.4))
    model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.4))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model
 


## The Generator Model
Let's now define the generator model.  This model is unlike what we have seen before, and uses different layers than we have seen before, but is still a form of a CNN.  In this case, the network is generating a 28x28 image.  We also define a network that combines the discriminator with the generator that we will use to update the generator.  This combined model uses a frozen discriminator as input to the generator.  The generator will update on the bases of the discriminator's performance.

In [None]:
# define the standalone generator model
def define_generator(latent_dim):
    model = Sequential()
    # foundation for 7x7 image
    n_nodes = 128 * 7 * 7
    model.add(Dense(n_nodes, input_dim=latent_dim))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Reshape((7, 7, 128)))
    # upsample to 14x14
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
    # upsample to 28x28
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Conv2D(1, (7,7), activation='sigmoid', padding='same'))
    return model
 
# define the combined generator and discriminator model, for updating the generator
def define_gan(g_model, d_model):
    # make weights in the discriminator not trainable
    d_model.trainable = False
    # connect them
    model = Sequential()
    # add generator
    model.add(g_model)
    # add the discriminator
    model.add(d_model)
    # compile model
    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss='binary_crossentropy', optimizer=opt)
    return model


## Other Functions
Now we define some further functions helpful for manipulating real and synthetic data and to train the GAN.

In [None]:
# load and prepare mnist training images
def load_real_samples():
    # load mnist dataset
    (trainX, _), (_, _) = load_data()
    # expand to 3d, e.g. add channels dimension
    X = np.expand_dims(trainX, axis=-1)
    # convert from unsigned ints to floats
    X = X.astype('float32')
    # scale from [0,255] to [0,1]
    X = X / 255.0
    return X
 
# select real samples
def generate_real_samples(dataset, n_samples):
    # choose random instances
    ix = np.random.randint(0, dataset.shape[0], n_samples)
    # retrieve selected images
    X = dataset[ix]
    # generate 'real' class labels (1)
    y = np.ones((n_samples, 1))
    return X, y
 
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples):
    # generate points in the latent space
    x_input = np.random.randn(latent_dim * n_samples)
    # reshape into a batch of inputs for the network
    x_input = x_input.reshape(n_samples, latent_dim)
    return x_input
 
# use the generator to generate n fake examples, with class labels
def generate_fake_samples(g_model, latent_dim, n_samples):
    # generate points in latent space
    x_input = generate_latent_points(latent_dim, n_samples)
    # predict outputs
    X = g_model.predict(x_input)
    # create 'fake' class labels (0)
    y = np.zeros((n_samples, 1))
    return X, y
 
# create and save a plot of generated images (reversed grayscale)
def save_plot(examples, epoch, n=10):
    # plot images
    for i in range(n * n):
        # define subplot
        plt.subplot(n, n, 1 + i)
        # turn off axis
        plt.axis('off')
        # plot raw pixel data
        plt.imshow(examples[i, :, :, 0], cmap='gray')
    # save plot to file
    filename = 'generated_plot_e%03d.png' % (epoch+1)
    plt.savefig(filename)
    plt.close()

# evaluate the discriminator, plot generated images, save generator model
def summarize_performance(epoch, g_model, d_model, dataset, latent_dim, n_samples=100):
    # prepare real samples
    X_real, y_real = generate_real_samples(dataset, n_samples)
    # evaluate discriminator on real examples
    _, acc_real = d_model.evaluate(X_real, y_real, verbose=0)
    # prepare fake examples
    x_fake, y_fake = generate_fake_samples(g_model, latent_dim, n_samples)
    # evaluate discriminator on fake examples
    _, acc_fake = d_model.evaluate(x_fake, y_fake, verbose=0)
    # summarize discriminator performance
    print('>Accuracy real: %.0f%%, fake: %.0f%%' % (acc_real*100, acc_fake*100))
    # save plot
    save_plot(x_fake, epoch)
    # save the generator model tile file
    filename = 'generator_model_%03d.h5' % (epoch + 1)
    g_model.save(filename)

# train the generator and discriminator
def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=256):
    bat_per_epo = int(dataset.shape[0] / n_batch)
    half_batch = int(n_batch / 2)
    # manually enumerate epochs
    for i in range(n_epochs):
        # enumerate batches over the training set
        for j in range(bat_per_epo):
            # get randomly selected 'real' samples
            X_real, y_real = generate_real_samples(dataset, half_batch)
            # generate 'fake' examples
            X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
            # create training set for the discriminator
            X, y = np.vstack((X_real, X_fake)), np.vstack((y_real, y_fake))
            # update discriminator model weights
            d_loss, _ = d_model.train_on_batch(X, y)
            # prepare points in latent space as input for the generator
            X_gan = generate_latent_points(latent_dim, n_batch)
            # create inverted labels for the fake samples
            y_gan = np.ones((n_batch, 1))
            # update the generator via the discriminator's error
            g_loss = gan_model.train_on_batch(X_gan, y_gan)
            # summarize loss on this batch
            print('>%d, %d/%d, d=%.3f, g=%.3f' % (i+1, j+1, bat_per_epo, d_loss, g_loss))
        # evaluate the model performance, sometimes
        if ((i+1) % 10 == 0) or i==0:
            summarize_performance(i, g_model, d_model, dataset, latent_dim)

## Train the Network
Now we train the GAN.

In [None]:
# size of the latent space
latent_dim = 100
# create the discriminator
d_model = define_discriminator()
# create the generator
g_model = define_generator(latent_dim)
# create the gan
gan_model = define_gan(g_model, d_model)
# load image data
dataset = load_real_samples()
# train model
train(g_model, d_model, gan_model, dataset, latent_dim)

## Visualize Output
Note in the `summarize_performance` function above that a plot of results is saved every 10 epochs.  We can visualize the quality of they synthetic data as the network trained.

In [None]:
epoch = '010'
img = np.asarray(imageio.imread('generated_plot_e'+epoch+'.png'))
plt.figure(figsize=(10,10))
plt.imshow(img,cmap='gray')
plt.axis('off')
plt.title('Epoch '+epoch)
plt.show()