{"tags": ["hide-output" ]}
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In this very simple example, we will train a deep neural network (DNN) to classify flower species on the well-known Iris dataset. We will then demonstrate how snapshot_ensemble
may be used to automatically save the neural network at several points during training, in order to generate an ensemble of models at the cost of training a single one.
# Read Iris Dataset from UCI Repository
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
df.columns = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']
df.head()
There are 150 samples in this dataset, with 4 features corresponding to the length and width of the flower characteristics. The outcome variable is categorical, corresponding to 3 species of Iris flowers, which we will one-hot-encode.
# Prepare features/targets for supervised learning
X, targets = df.iloc[:,:4], df.iloc[:,-1]
N, numFeatures = X.shape
# One-hot-encode targets
targetEnc = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
numOutcomes = len(targetEnc)
Y = np.zeros((N, numOutcomes))
for n,outcome in enumerate(targets):
Y[n, targetEnc.index( outcome ) ] = 1
We will first train the DNN as usual, using a very small architecture given the data size with standard cross-entropy loss.
import tensorflow as tf
import tensorflow.keras as tfk
def CompileDNNModel(numFeatures=4, numOutcomes=3, architecture=[6,6]):
x = tfk.Input(shape=(numFeatures,))
f = x
for nodes in architecture:
f = tfk.layers.Dense(nodes, activation='relu')(f)
f = tfk.layers.Dense(numOutcomes, activation='softmax')(f)
model = tfk.Model(inputs=x, outputs=f)
model.compile(
loss=tfk.losses.CategoricalCrossentropy(from_logits=False),
optimizer=tfk.optimizers.Adam(),
)
return model
# Train the model
architecture = [6,6]
model = CompileDNNModel(numFeatures=numFeatures, numOutcomes=numOutcomes, architecture=architecture)
lossHistory = model.fit(X, Y,
batch_size=50,
epochs=250,
shuffle=True,
verbose=0
)
# Loss history
plt.plot(lossHistory.history['loss'])
# Make predictions
Y_hat = model.predict(X)
# Evaluate model
from sklearn.metrics import classification_report
print(classification_report(np.argmax(Y, axis=-1), np.argmax(Y_hat, axis=-1)))
We can see that the model performance is very good after 250 epochs, with accuracy at 93% and F1-scores around 0.89-1.0.
Now we will demonstrate how to use snapshot_ensemble
to generate an ensemble of trained DNNs at the cost of a single training period. The DNN will be trained with cosine annealing, and here we just use the default hyperparameters which we can visualize below:
from snapshot_ensemble import *
VisualizeLR(cycle_length=10, cycle_length_multiplier=1.5, lr_multiplier=0.9, lr_init=0.01, lr_min=1e-6)
To do this, we simply pass in SnapshotEnsembleCallback
into the callbacks
argument when training. If validation data is supplied and we wished to also save the "best" model that minimizes validation loss, we may also use the helper function GenerateSnapshotCallbacks()
that includes both SnapshotEnsembleCallback
and ModelCheckpoint
. By default, these models are saved into Ensemble/
in the current working directory.
# Compile the DNN model
architecture = [6,6]
model = CompileDNNModel(numFeatures=numFeatures, numOutcomes=numOutcomes, architecture=architecture)
# Snapshot Ensemble callbacks
callbacks = [
# Note: See `help(SnapshotEnsembleCallback)` for documentation on the hyperparameters
SnapshotEnsembleCallback(cycle_length=10, cycle_length_multiplier=1.5, lr_multiplier=0.9, lr_init=0.01, lr_min=1e-6),
]
# Train the model with cosine annealing + snapshot ensemble
lossHistory = model.fit(X, Y,
batch_size=50,
epochs=250,
callbacks=callbacks,
shuffle=True,
verbose=0
)
# Loss history
plt.plot(lossHistory.history['loss'])
After training, we will load the saved models from Ensemble/
to be used as part of an ensemble. For simplicity, we will use uniform weights and average each model's estimates.
# Load in snapshotted models as an ensemble
import glob
models = []
for file in glob.glob('Ensemble/*.h5'):
mod = CompileDNNModel(numFeatures=numFeatures, numOutcomes=numOutcomes, architecture=architecture)
mod.load_weights( file )
models.append( mod )
# Make ensembled predictions
Y_hat_ens = []
for mod in models:
Y_hat_k = mod.predict( X )
Y_hat_ens.append( Y_hat_k )
# Ensemble with simple uniform weights
Y_hat_ens = np.mean(Y_hat_ens, axis=0)
# Evaluate model
print(classification_report(np.argmax(Y, axis=-1), np.argmax(Y_hat_ens, axis=-1)))
We can see that accuracy has improved to 98% as well as the F1-scores to 0.97-1.0. Note that the hyperparameters of the learning rate schedule can be important and the default values are likely suboptimal for your task - they will require some hyperparameter tuning and understanding of the loss surface.