Image by author
Each machine learning model you train has a set of model parameters or coefficients. The goal of the machine learning algorithm, formulated as an optimization problem, is to learn the optimal values of these parameters.
In addition, machine learning models also have a set of hyperparameters. As the value of K, the number of neighbors, in the K-Nearest Neighbors algorithm. Or the batch size when training a deep neural network, and more.
The model does not learn these hyperparameters. But rather specified by the developer. They influence the performance of the model and are adjustable. So how do you find the best values for these hyperparameters? This process is called hyperparameter optimization either hyperparameter tuning.
The two most common hyperparameter tuning techniques include:
- Grid Search
- Random search
In this guide, we will learn how these techniques work and their scikit-learn implementation.
Let’s start by training a simple support vector machine (SVM) classifier on the Wine dataset.
First, import the required modules and classes:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
The wine dataset is part of the built-in datasets in scikit-learn. So, let’s read the features and target tags as shown:
# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target
The wine data set is a simple data set with 13 numerical features and three output class labels. It is a good candidate data set for learning how to solve multi-class classification problems. You can run wine.DESCR
for a description of the data set.
Wine outlet.DESCR
Next, split the data set into train and test sets. Here we have used a test_size
of 0.2. So, 80% of the data goes to the training data set and 20% to the testing data set.
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=24)
Now create an instance of a support vector classifier and fit the model to the training data set. Then evaluate its performance on the test set.
# Create a baseline SVM classifier
baseline_svm = SVC()
baseline_svm.fit(X_train, y_train)
y_pred = baseline_svm.predict(X_test)
Since this is a simple multiple classification problem, we can observe the accuracy of the model.
# Evaluate the baseline model
accuracy = accuracy_score(y_test, y_pred)
print(f"Baseline SVM Accuracy: {accuracy:.2f}")
We see that the accuracy score of this model with the default values for the hyperparameters is approximately 0.78.
Output >>>
Baseline SVM Accuracy: 0.78
Here we use a random_state
of 24. For a different random state, you will get a different training test split and subsequently a different accuracy score.
Therefore, we need a better way than a single split between train and test to evaluate model performance. Maybe train the model on many of those splits and consider the average accuracy. While testing different combinations of hyperparameters? Yes, that’s why we use cross-validation in model evaluation and hyperparameter search. We will learn more in the following sections.
Next, let’s identify the hyperparameters we can tune for this support vector machine classifier.
In hyperparameter tuning, our goal is to find the best combination of hyperparameter values for our SVM classifier. Commonly tuned hyperparameters for the support vector classifier include:
- c: Regularization parameter, which controls the balance between maximizing the margin and minimizing the classification error.
- core: Specifies the type of kernel function to use (for example, ‘linear’, ‘rbf’, ‘poly’).
- spectrum: Kernel coefficient for ‘rbf’ and ‘poly’ kernels.
Cross validation helps evaluate how well the model generalize to unseen data and reduces the risk of overfitting to a single train-test split. Commonly used k-fold cross validation involves dividing the data set into k folds of the same size. The model is trained. k times, with each fold serving as a validation set once and the remaining folds as training set. So for each fold, we will get a cross-validation accuracy.
When we run the grid and random searches to find the best hyperparameters, we will choose the hyperparameters based on the best average cross-validation score.
Grid Search is a hyperparameter tuning technique that performs a exhaustive search in a specific hyperparameter space to find the combination of hyperparameters that produces the best model performance.
How grid search works
We define the hyperparameter search space as a grid of parameters. He parameter grid is a dictionary where you specify each hyperparameter you want to tune with a list of values to explore.
The grid search then systematically explores all possible combinations of hyperparameters from the parameter grid. Fit and evaluate the model for each combination using cross-validation and select the combination that produces the best performance.
Next, let’s implement grid search in scikit-learn.
First, import the GridSearchCV
scikit-learn class model_selection module:
from sklearn.model_selection import GridSearchCV
Let’s define the parameter grid for the SVM classifier:
# Define the hyperparameter grid
param_grid = {
'C': (0.1, 1, 10),
'kernel': ('linear', 'rbf', 'poly'),
'gamma': (0.1, 1, 'scale', 'auto')
}
The grid search then systematically explores all possible combinations of hyperparameters from the parameter grid. For this example, evaluate the model’s performance with:
C
set to 0.1, 1 and 10,kernel
set to ‘linear’, ‘rbf’ and ‘poly’, andgamma
set to 0.1, 1, ‘scale’ and ‘auto’.
This results in a total of 3 * 3 * 4 = 36 different combinations to evaluate. Grid search fits and evaluates the model for each combination using cross-validation and selects the combination that produces the best performance.
Then we instantiate GridSearchCV
to adjust the hyperparameters of the baseline_svm
:
# Create the GridSearchCV object
grid_search = GridSearchCV(estimator=baseline_svm, param_grid=param_grid, cv=5)
# Fit the model with the grid of hyperparameters
grid_search.fit(X_train, y_train)
Note that we have used five-fold cross-validation.
Finally, we evaluate the performance of the best model (with the optimal hyperparameters found through grid search) on the test data:
# Get the best hyperparameters and model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
# Evaluate the best model
y_pred_best = best_model.predict(X_test)
accuracy_best = accuracy_score(y_test, y_pred_best)
print(f"Best SVM Accuracy: {accuracy_best:.2f}")
print(f"Best Hyperparameters: {best_params}")
As seen, the model achieves an accuracy score of 0.94 for the following hyperparameters:
Output >>>
Best SVM Accuracy: 0.94
Best Hyperparameters: {'C': 0.1, 'gamma': 0.1, 'kernel': 'poly'}
Using grid search for hyperparameter tuning has the following advantages:
- Grid search explores all specified combinations, ensuring that you don’t miss the best hyperparameters within the defined search space.
- It is a good option for exploring smaller hyperparameter spaces.
However, on the other hand:
- Grid search can be computationally expensive, especially when dealing with a large number of hyperparameters and their values. It may not be feasible for very complex models or extensive hyperparameter searches.
Now let’s learn about random search.
Random search is another hyperparameter tuning technique that explores random combinations of hyperparameters within specific distributions or ranges. It is particularly useful when dealing with a large hyperparameter search space.
How random search works
In random search, instead of specifying a grid of values, you can define probability distributions or ranges for each hyperparameter. Which becomes a much larger hyperparameter search space.
Random search then random samples a fixed number of hyperparameter combinations of these distributions. This allows random search to efficiently explore a diverse set of hyperparameter combinations.
Now let’s tune the parameters of the reference SVM classifier using random search.
we import the RandomizedSearchCV
classify and define param_dist
a much larger hyperparameter search space:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
param_dist = {
'C': uniform(0.1, 10), # Uniform distribution between 0.1 and 10
'kernel': ('linear', 'rbf', 'poly'),
'gamma': ('scale', 'auto') + list(np.logspace(-3, 3, 50))
}
Similar to grid search, we instantiate the random search model to search for the best hyperparameters. Here we set n_iter
to 20; so 20 random combinations of hyperparameters will be sampled.
# Create the RandomizedSearchCV object
randomized_search = RandomizedSearchCV(estimator=baseline_svm, param_distributions=param_dist, n_iter=20, cv=5)
randomized_search.fit(X_train, y_train)
We then evaluate the performance of the model with the best hyperparameters found through random search:
# Get the best hyperparameters and model
best_params_rand = randomized_search.best_params_
best_model_rand = randomized_search.best_estimator_
# Evaluate the best model
y_pred_best_rand = best_model_rand.predict(X_test)
accuracy_best_rand = accuracy_score(y_test, y_pred_best_rand)
print(f"Best SVM Accuracy: {accuracy_best_rand:.2f}")
print(f"Best Hyperparameters: {best_params_rand}")
The best precision and optimal hyperparameters are:
Output >>>
Best SVM Accuracy: 0.94
Best Hyperparameters: {'C': 9.66495227534876, 'gamma': 6.25055192527397, 'kernel': 'poly'}
The parameters found by random search are different from those found by grid search. The model with these hyperparameters also achieves an accuracy score of 0.94.
Let’s summarize the advantages of random search:
- Random search is effective when dealing with a large number of hyperparameters or a wide range of values because it does not require an exhaustive search.
- It can handle various types of parameters, including continuous and discrete values.
These are some limitations of random search:
- Due to its random nature, you may not always find the best hyperparameters. But you often find the good ones quickly.
- Unlike grid search, it does not guarantee that all possible combinations are explored.
We learned how to perform hyperparameter tuning with RandomizedSearchCV
and GridSearchCV
in scikit-learn. We then evaluate the performance of our model with the best hyperparameters.
In short, grid search exhaustively searches all possible combinations in the parameter grid. While random search randomly displays combinations of hyperparameters.
Both techniques help you identify the optimal hyperparameters for your machine learning model while reducing the risk of overfitting to a specific train test split.
Bala Priya C. is a developer and technical writer from India. He enjoys working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. He likes to read, write, code and drink coffee! Currently, he is working to learn and share his knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more.