Hyperparameters determine how well your neural network learns and processes information. Model parameters are learned during training. Unlike these parameters, hyperparameters must be set before the training process begins. In this article, we will describe techniques for optimizing hyperparameters in models.
Hyperparameters in neural networks
Learning rate
The learning rate tells the model how much to change based on its errors. If the learning rate is high, the model learns quickly but can make mistakes. If the learning rate is low, the model learns slowly but more carefully. This results in fewer errors and higher accuracy.
Source: https://www.jeremyjordan.me/nn-learning-rate/
There are ways to adjust the learning rate to achieve the best possible results. This involves adjusting the learning rate at predefined intervals during training. Additionally, optimizers like Adam allow for auto-tuning of the learning rate based on the training run.
Lot Size
Batch size is the number of training samples a model undergoes at a given time. A large batch size basically means that the model goes through more samples before parameter update. It can lead to more stable learning, but requires more memory. On the other hand, a smaller batch size updates the model more frequently. In this case, learning can be faster, but has more variation in each update.
The batch size value affects the memory and processing time for learning.
Number of epochs
Epochs refer to the number of times a model runs through the entire dataset during training. An epoch includes multiple cycles where all batches of data are presented to the model, it learns from them and optimizes its parameters. The more epochs there are, the better the model learns, but if they are not observed well, they can lead to overfitting. It is necessary to decide on the right number of epochs to achieve good accuracy. Techniques such as early stopping are commonly used to find this balance.
Activation function
Activation functions decide whether a neuron should be activated or not, creating non-linearity in the model, which is especially beneficial when trying to model complex interactions in the data.
Source: https://www.researchgate.net/publication/354971308/figure/fig1/AS:1080246367457377@1634562212739/Curves-of-the-Sigmoid-Tanh-and-ReLu-activation-functions.jpg
The most common activation functions are ReLU, Sigmoid, and Tanh. ReLU speeds up the training of neural networks as it only allows positive activations in neurons. Sigmoid is used to assign probabilities as it outputs a value between 0 and 1. Tanh is advantageous especially when one does not want to use the entire scale, which ranges from 0 to ± infinity. Selecting a suitable activation function requires careful consideration as it determines whether the network will be able to make a good prediction or not.
Abandon
The Dropout technique is used to prevent overfitting of the model. It randomly disables or “drops” some neurons by setting their outputs to zero during each training iteration. This process prevents neurons from being overly dependent on specific inputs, features, or other neurons. By discarding the output of specific neurons, Dropout helps the network to focus on essential features in the training process. Dropout is mainly implemented during training, while it is disabled in the inference phase.
Hyperparameter tuning techniques
Manual search
This method involves trial and error with parameter values that determine how the learning process of a machine learning model is carried out. These settings are adjusted one at a time to observe how they influence the performance of the model. Let's try changing the settings manually to get more accuracy.
learning_rate = 0.01
batch_size = 64
num_layers = 4
model = Model(learning_rate=learning_rate, batch_size=batch_size, num_layers=num_layers)
model.fit(X_train, y_train)
Manual search is straightforward because it does not require complicated algorithms to manually configure test parameters. However, it has several disadvantages compared to other methods. It can be time-consuming and may not find the best configurations as efficiently as automated methods.
Grid Search
Grid search tries many different combinations of hyperparameters to find the best ones. You train the model on one part of the data. Then, you check how well it performs on another part. Let’s implement grid search using GridSearchCV to find the best model.
from sklearn.model_selection import GridSearchCV
param_grid = {
'learning_rate': (0.001, 0.01, 0.1),
'batch_size': (32, 64, 128),
'num_layers': (2, 4, 8)
}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)
Grid search is much faster than manual search. However, it is computationally expensive because it takes time to check every possible combination.
Random search
This technique randomly selects combinations of hyperparameters to find the most efficient model. For each random combination, it trains the model and checks its performance. This way, it can quickly arrive at good settings that make the model perform better. We can implement a random search using RandomizedSearchCV to achieve the best model on the training data.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint
param_dist = {
'learning_rate': uniform(0.001, 0.1),
'batch_size': randint(32, 129),
'num_layers': randint(2, 9)
}
random_search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=10, cv=5)
random_search.fit(X_train, y_train)
Random search is often better than grid search because only a few hyperparameters are checked to get the right settings. However, it may not find the right combination of hyperparameters, especially when the hyperparameter workspace is large.
Ending
We have covered some of the basic hyperparameter tuning techniques. Advanced techniques include Bayesian optimization, genetic algorithms, and hyperbandwidth.
Jayita Gulati She is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Masters in Computer Science from the University of Liverpool.