When training a neural network, there are several design choices that can be made about the neural network architecture (these are called hyperparameters). These include the number of layers in the network, the number of nodes per layer, how many kernels to use in a convolutional layer, the kernel size, which type of nonlinear activation function to use, and several other decisions.
The state-of-the-art method for selecting the correct hyperparameters is called hyperparameter tuning. Hyperparameter tuning is simple: you can think of it as looping through all possible hyperparameters, training a model with those hyperparameters, evaluating the test performance with each model, and selecting the hyperparameters that resulted in the highest performance. That’s all there is to it!
Below is an example of pseudocode for hyperparameter tuning:
for activation_function in [‘relu’, ‘tanh’]:
for dropout_rate in [0.0, 0.25, 0.5, 0.75]:
Train with activation_function and dropout_rate
Store performance of model on test set
choose the hyperparameters that resulted in the best performance.
Just as a model can overfit its learned weights, models can also overfit to the hyperparameters that are selected from hyperparameter tuning. To account for this, it is common to use a technique called cross validation.
Cross validation is a technique for evaluating a machine learning model with different partitions of the training and testing sets. A number of folds is specified when performing cross validation.
Cross validation is one of those concepts that is easier to understand by seeing an example. In 5-fold cross validation, 5 different versions of a model are trained with the following ways of splitting up the dataset into training and testing sets:
When accuracy (or any other metric) is reported, it is the average accuracy across all 5 folds, which is a better indicator of the model performance than only one split of the train and test set. By performing cross validation, you can give other people confidence that you didn’t just choose the most convenient (highest performing) split of the data into training and testing sets.
Cross validation also helps prevent overfitting of hyperparameters. You can, for example, select the hyperparameter that performs the best the most often (for a categorical hyperparameter like the activation function) or you could take an average of the best performing hyperparameter value (for a numerical hyperparameter like learning rate).