Step 2: Model validation for hyperparameters tuning

Estimating the accuracy of a model is also important because it can help us tune the hyperparameters of our model. For example, if we are using a k-nearest neighbor model, we may want to pick the parameter k that optimizes the generalization performance of the algorithm. We will list here a few techniques that can be used to pick the best performing hyperparameters.

Three-way holdout method:

As we have seen in the previous section, one way to estimate the performance of our algorithm is to test it on unseen data. If we want to compare models with various hyperparameters, we can use a similar principle, but split our data into three non-overlapping sets: a training set for model fitting, a validation set for model selection and a test set for model evaluation.

We illustrate here how the three-way holdout technique can be used for model selection:

  • Step 1. We split our data into a training, a validation and a test set. Common train/validation/test splits are: 1/3 and 2/3; 70/15/15; 80/10/10; 60/20/20.

  • Step 2. We now tune the hyperparameters. We take a few models with different hyperparameters, and fit them to the training data.

  • Step 3. We evaluate the performance of all these models on the validation set. This allows us to select the best performing model.

  • Step 4 (optional). If the training set is too small, after selecting the model we can re-fit it on both the training and validation sets.

  • Step 5. Now, we can run our model on the test set, and estimate the generalization performance.

  • Step 6 (optional). If we do not need the test set, we can then merge it with the rest of the data and retrain the model on the whole dataset.

Some possible issues with this method:

  • Resampling violates the assumption of independent, identically distributed samples.

  • Resampling may not give us an accurate representation of the distribution, modifying the statistics of the sample. For example, in a classification task we may end up not having enough samples from one class in the training data.

  • Having one fixed split may not give us an accurate estimate of the performance of the algorithm.

  • If the training set is not large enough, the model may not have reached capacity. In this case, reducing the amount of training data may introduce a pessimistic bias.

Please notice that this method is very common in deep learning applications. This is partially because of its low computational costs, and because deep learning datasets tend to be quite large. If the dataset is quite large, then resampling will introduce less variance, since we can imagine that each subset will be a more reliable representation of the whole dataset. However, in most other cases, we would recommend using k-fold cross-validation.

K-fold cross-validation

Cross-validation is probably the most widely used technique. The rationale behind this technique is that each sample in our dataset has to be tested. With k-fold cross-validation, we split the data in k equal parts. We use one part for validation and the remaining k-1 parts for training. We then iterate this process such that each one of the parts gets to be the validation set.

  • Step 1. We split the dataset into a training set and a test set. We put away the test set for now.

  • Step 2. We now tune the hyperparameters. We take various models under different hyperparameter settings. We fit each one of these models to the training data, using the k-fold cross-validation procedure, which will result in a generalization performance for each model.

  • Step 3. We select the hyperparameter settings that lead to the best performance. We then fit the chosen model to the whole training set.

  • Step 4. We test the model on the test set that we set aside in Step 1.

  • Step 5 (optional). If we do not need the test set, we can now fit the model to all data.

For an example of hyperparameter tuning through k-fold cross-validation you can access our github page or download the following file:

Last updated