Hyperparameter Optimization with Optuna
In place of grid or random search approaches to HPO, we recommend the use of the Optuna framework for Bayesian hyperparameter sampling and trial pruning (in models where intermediate results are available). Optuna can also integrate with MLflow for convinient logging of optimal parameters.
In this tutorial, we take the model and training approach detailed in the Single-GPU Training (Custom Mlflow) tutorial to build our HPO on.
First, we install the Optuna package:
pip install optuna
We will now make adjustments to our training script to test a series of hyperparameters. This entails three main parts:
- Wrap the whole of our model definition, training, and testing logic in an
objective
function that returns our chosen evaluation metric. - Suggest hyperparameters to test using Optuna’s
trial.suggest_<type>()
method. - Initiate a
study
with the number of trials we would like to run.
To use Optuna in our training scripts, we first import the Optuna package (in addition to those required by the model).
import optuna
For the model detailed in Single-GPU Training (Custom Mlflow), ignoring MLflow-related code, our objective
function looks like this :
def objective(trial):
input_size = 784
#hidden_size1 = 200
hidden_size1 = trial.suggest_int('hidden_size1', 100, 300)
#hidden_size2 = 200
hidden_size2 = trial.suggest_int('hidden_size2', 100, 300)
output_size = 10
num_epochs = 4
batch_size = 100
lr = 0.01
my_net = SeqNet(input_size, hidden_size1, hidden_size2, output_size)
my_net = my_net.to(device)
optimizer = torch.optim.Adam( my_net.parameters(), lr=lr)
loss_function = nn.CrossEntropyLoss()
fmnist_train = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
fmnist_test = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
fmnist_train_loader = DataLoader(fmnist_train, batch_size=batch_size, shuffle=True)
fmnist_test_loader = DataLoader(fmnist_test, batch_size=batch_size, shuffle=True)
train(my_net, fmnist_train_loader, loss_function, optimizer, num_epochs)
correct = 0
total = 0
for images,labels in fmnist_test_loader:
images = torch.div(images, 255.)
images = images.to(device)
labels = labels.to(device)
output = my_net(images)
_, predicted = torch.max(output,1)
correct += (predicted == labels).sum()
total += labels.size(0)
#print('Accuracy of the model: %.3f %%' %((100*correct)/(total+1)))
acc = ((100*correct)/(total+1))
return acc
where, instead of explicity setting the size of our hidden layers, we let Optuna suggest values by using trial.suggest_int()
and passing the variable name, and lower/upper limits on the range we’d like to test.
It is important that this function returns the desired evaluation metric. In this case, we use accuracy acc
.
In our main code, we can now instantiate a study
and begin optimizing:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10)
One can access and print the parameters for each trial and the optimal parameters as follow:
print("Best trial:")
trial = study.best_trial
print(" Value: ", trial.value)
print(" Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
Another useful feature of Optuna is trial pruning. To implement this, we must report our evaluation score at each step using trial.report(intermediate_value, step)
and selecting a pruning method when creatind our study:
study = optuna.create_study(pruner = optuna.pruners.SuccessiveHalvingPruner(), direction= "maximize")
More information on pruning with Optuna can be found here.
We can also track and log our best parameters and best evaluation metric in MLflow by wrapping our main code in an MLflow run:
with mlflow.start_run():
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10)
mlflow.log_params(study.best_params)
mlflow.log_metric("best_acc", study.best_value)
Note that one could also create an MLflow run for each individual trial by wrapping the objective
function in an MLflow run as follows:
def objective(trial):
#start MLflow run
with mlflow.start_run():
input_size = 784
#hidden_size1 = 200
hidden_size1 = trial.suggest_int('hidden_size1', 100, 300)
#hidden_size2 = 200
hidden_size2 = trial.suggest_int('hidden_size2', 100, 300)
output_size = 10
num_epochs = 4
batch_size = 100
lr = 0.01
#device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#print("Training on device: ", device)
my_net = SeqNet(input_size, hidden_size1, hidden_size2, output_size)
my_net = my_net.to(device)
optimizer = torch.optim.Adam( my_net.parameters(), lr=lr)
loss_function = nn.CrossEntropyLoss()
fmnist_train = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
fmnist_test = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
fmnist_train_loader = DataLoader(fmnist_train, batch_size=batch_size, shuffle=True)
fmnist_test_loader = DataLoader(fmnist_test, batch_size=batch_size, shuffle=True)
train(my_net, fmnist_train_loader, loss_function, optimizer, num_epochs)
#log params and model in current MLflow run
mlflow.log_params({"epochs": num_epochs, "lr" : lr})
mlflow.pytorch.log_model(my_net, "model")
correct = 0
total = 0
for images,labels in fmnist_test_loader:
images = torch.div(images, 255.)
images = images.to(device)
labels = labels.to(device)
output = my_net(images)
_, predicted = torch.max(output,1)
correct += (predicted == labels).sum()
total += labels.size(0)
#print('Accuracy of the model: %.3f %%' %((100*correct)/(total+1)))
acc = ((100*correct)/(total+1))
return acc
Download the full script used in this example here