MLOps Building Blocks: Chapter 5 - Experimenting with MLflow and AWS

MLOps Sep 17, 2021

In our previous post we learnt how to Setup MLflow on our local system and run experiments as we need.

As we want to gradually move towards getting our Pipeline ready for Production, we need to consider integration with Cloud.

Let's take the first step towards integrating with Cloud, In today's article we are going to setup MLflow on our Local system but store all the Artifacts on Cloud.

So that if any member from within the team wishes to access the artifacts to test on their system, they can do that freely.

Step 0: Configure AWS on your system

If you already AWS Credentials configured on your system, please feel free to skip this step. For all the readers who haven't, kindly follow the link mentioned below to setup your account credentials on your local system.

AWS CLI and SDK - Setup for Devs
Before one embarks on a Journey to explore the plethora of services, heading towards innovation, the biggest obstacle for any developer is setting up their system. Sometimes going through vast expanse of documentation can prove to be extremely exhaustive. This article aims at bringing together all t…

Step 1: Install the necessary packages

If you have already followed our previous article then you would have MLflow installed on your system. In case you haven't, run the following command in your terminal.

pip3 install mlflow

Step 2: Create a Training Script

As mentioned in the previous articles as well, we will be using Image Classification on Intel Dataset as a base Model to all our posts in this series.

If you haven't followed our previous posts, you can follow the link below to create a Basic Image Classification Model using Tensorflow.

Custom Vision: Chapter 3 - Image Classification - Tensorflow
In our previous posts we explored the Roadmap of Custom Vision and covered some of the Pitstops on our way to this Chapter. So What is this Chapter about? Well we know the importance of Custom Vision to the Industry. We are also aware of the various hurdles one has

Now that our training script is ready, let's go ahead and make some changes to the same.

Step 3: Open your Editor and Get a Cup of Coffee

In the meantime also open your training script to start editing.

Step 4: Import the necessary Packages

In addition to all the packages, add the following line to import MLflow for our use case.

import mlflow

Step 5: Create an Experiment

Whenever we run MLflow, the default experiment name allotted to every execution is "Default"

Many a times when there are a group of developers involved, they might be working on different engagements catering to different models.
In such cases it is very essential to segregate the experiments for ease of tracking.

Add the following lines of code to your training script from above -

experiment_name = "image_classification"
s3_artifacts_path = "s3://mlops-manifest-files/mlflow"
experiment = mlflow.get_experiment_by_name(name=experiment_name)
if experiment:
    experiment = mlflow.create_experiment(

Let's dive a little deeper to understand what exactly is happening.

  1. We have declared 2 variables - one for declaring the name of the Experiment and the other to declare the Path where we want to store the Model Artifacts.
  2. In the next stage, we GET an Experiment by the name mentioned above. This Experiment acts as a separate channel to track all the training and Model development that happen under that banner.
  3. In case there is no Experiment under that name, we create a new Experiment by the same name.
  4. If you notice carefully, we are also passing it the S3 URI of where we want to store the Artifacts.
  5. Based on the Configuration that you did in Step 0, MLflow picks up the same credentials to store and move the artifacts created into S3.
  6. One of the most important Step is to Set the Experiment, or else all the executions get automatically tagged to "Default" experiment name.
  7. Finally we also add the AutoLog feature of MLflow for tensorflow.

The AutoLog property of MLflow tracks all the standard parameters and metrics consumed by the model during training.

For further details on the same, you can follow the link mentioned below -

MLflow Tracking — MLflow 1.19.0 documentation

Step 6: Let's make things interesting

As we have already added AutoLog, it's not necessary to log the parameters separately.

But that's not the case with Metrics
By default, MLflow only tracks the metrics of the First Epoch.

Now obviously that is of no use, so in order to avoid such a predicament we will make some minor tweaks to our training script.

Altering the following piece of code as mentioned below should solve all your issues -

def train(dataset_path, batch_size, epochs, input_shape):
    with mlflow.start_run():
        all_images, all_labels, rev_labels = data_reading(dataset_path=dataset_path)

        print("target_encodings: ", all_labels)
        print("Number of training images: ", len(all_images))
        # Log Parameters in MLflow
        params = {
            "batch_size": batch_size,
            "epochs": epochs,
            "input_shape": input_shape,
            "encodings": all_labels,
            "training_images": len(all_images),

        train_generator = data_loader(

        model = model_arc(y_labels=all_labels, image_inp_shape=input_shape)

        history = model.fit_generator(
            steps_per_epoch=(len(all_images) // batch_size),
        accuracy = history.history.get("accuracy")
        loss = history.history.get("loss")
        # Log Metrics in MLflow separately for Accuracy and Loss
        for index, value in enumerate(accuracy):
            mlflow.log_metric(key="accuracy", value=value, step=index)
        for index, value in enumerate(loss):
            mlflow.log_metric(key="loss", value=value, step=index)

As you can clearly see, we have made 2 significant changes to our training function -

  1. We have added the complete training lifecycle under a single MLflow Run, thereby making it optimised for tracking done by MLflow. (Used as a best practice)
  2. The History of Model training is returned as a Numpy array. So instead of storing it as is, we have broken it down into several Steps, so that while leveraging the Graph Visualisation offered by MLflow, we can cherish the results more intuitively.

Step 7: Launch MLflow

In your terminal run the following lines -

mlflow server --host

Open the browser of your choice and navigate to the following link -

Now all that's left is for you to run your training script on your Local system for as many epochs as you like.

Once completed, refresh your page and you should be able to see the following -

Step 8: Open AWS Console

Now that you have finished your training and your MLflow tracking is working seamlessly, all that's left is to cross-verify the Model Artifacts stored in S3

You should be able to visualise the Artifacts on the Tracking URI and their corresponding path.

So let's Open the Console and take the final Step.

Well it's evident, that for every iteration of your experiment there is a Unique Identifier and a corresponding folder with all the artifacts in the same.


Congratulations. If you have followed the steps above, you should have been able to successfully run your first Custom Image Classification Model in sync with Experiment Tracking with MLflow and AWS.

You can find the complete Github Repo on the link mentioned below -

AI-kosh/mlops/chp_5 at main · Chronicles-of-AI/AI-kosh
Archives of blogs on Chronicles of AI. Contribute to Chronicles-of-AI/AI-kosh development by creating an account on GitHub.

I hope this article finds you well. For more content on MLOps and their Step by Step implementation - STAY TUNED 😁


Vaibhav Satpathy

AI Enthusiast and Explorer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.