MLOps Chapter 8 : Model Server with Nvidia Triton - Local - Part 1.a

MLOps Oct 29, 2021

In our previous post we explored, What is Nvidia Triton and How is it changing the world of MLOps.

In case if you haven't read our previous article, I would highly recommend to go through it to understand the pre-text to what we are trying to achieve in this experiment.

Nvidia Triton - A Game Changer
The world has never been the same since the arrival of GPUs And guess who brought that change = NvidiaFor all the folks out there who don’t know Nvidia - Nvidia Corporation is an American multinational technology company incorporated in Delaware and based in Santa Clara, California. It designs grap…

What we didn't see is How to run it on our systems and leverage it for running Inference.

In today's article we will explore the practical implementation of Triton on Local Systems.

Step 0: Install the pre-requisites

As we would be working with many moving pieces, as a best practice, we would be using Docker as our approach to Containerise and offer these services to the larger crowd.

Install Docker Engine on your local System by following the link below -

Docker Desktop for Mac and Windows | Docker
Learn why Docker Desktop is the preferred choice for millions of developers building containerized applications. Download for Mac or Windows.

In our today's example we would be running an inference for a standard Image Classification type of problem statement.

Nothing to be worried about, as we have already covered in details in our previous chapters -

  1. Training and Model Architecture
  2. Inference Scripts and Serving as API
Although for our current exercise all we need is to train our model and save the Model Artifacts in our system.

For all those who haven't read our previous posts, kindly follow the link below to Train a Custom Image Classification model, as for the rest please feel free to skip to the next step -

Custom Vision: Chapter 3 - Image Classification - Tensorflow
In our previous posts we explored the Roadmap of Custom Vision and covered some of the Pitstops on our way to this Chapter. So What is this Chapter about? Well we know the importance of Custom Vision to the Industry. We are also aware of the various hurdles one has

Step 1: Import Nvidia Triton

Now that we have our training script ready, all we need to do is prepare our Model Server using Triton.

Without much a due, let's start our Docker engine and open a Terminal of our choice.

Once we are done with that, we shall pull the. Docker Image of Triton with the Version of our Choice from their Docker Repository using the following command -

docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
Here <xx.yy> stands for the version that you want. For today's experiment let's take the stable and Latest version 21.09
This might be a long running process, based on your internet speed. As the Docker Image size is just over 8 GB for Triton Base Version.

Step 2: Take Some coffee and Be Patient ...

While the Image is getting pulled, let's go ahead and make some alterations to our Training Script as needed for Inference Server.

At the End of the Training Script pot the Model Is triggered for Training, just add the following lines -

# Some Training Code
model.fit_generator(
        generator=train_generator,
        steps_per_epoch=(len(all_images) // batch_size),
        epochs=epochs,
    )
    
# Additional Lines to be added
model.save(os.path.join(path_to_save,"intel_image_class"))

What the previous lines of Code do is basically save the model and its weights in a SavedModel format of Tensorflow, which is needed for deployment using Triton.

Now with 2 running operations, let's get ourselves some coffee.

Step 3: Set up your Model Registry

Now that you have trained your model and have the model and its weights stored in a folder with a name of "intel_image_class", all that we need to do now is create a folder structure that Triton can read from.

model_repository_path/
|- <tensorflow_model_name>/
   |- config.pbtxt
   |- 1/
      |- model.savedmodel/
         |- <tensorflow_saved_model_files>

A couple of things to highlight in this step -

  1. If you would have noticed, there is a specific name to the tensorflow models folder. Yes that is mandatory or else Triton won't read your TF models
  2. The version names assigned to your Models are basically your Folder names, which again have to be numeric in nature.
  3. In addition to that, we need to create a config.pbtxt file, which basically defines the model's input and output params and dimensions which it's expecting during running inference.
  4. The config file has a specific template to it which we will discuss in our next steps.
  5. Finally the Parent Folder Name is the model name which is referred inside the Config file.

Step 4: Create the Config File

Here is a sample template with bare minimum configuration which is needed for a config file to assist Triton in loading the model.

name: "intel_image_class"
platform: "tensorflow_savedmodel"
max_batch_size: 8
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    format: FORMAT_NHWC
    dims: [ 100, 100, 3 ]
  }
]
output [
  {
    name: "dense"
    data_type: TYPE_FP32
    dims: [ 6 ]
  }
]

Once you create the Config file with the extension of .pbtxt, all that's left is to place it in the folder structure as mentioned above.

Before we move onto the next step, let's focus on a couple of parameters in the config file -

  1. name: - This name has to be the same as the model name / Folder Name. (Case Sensitive)
  2. platform: - Triton supports multiple platforms, this is an example for Tensorflow SavedModel type.
  3. max_batch_size: A numeric value to be defined by the user.
  4. input: - A list of dictionaries, where each dictionary corresponds to the inputs that the model can accept. In our case as we pass an Image - it has the following config -
  5. name: - This is the name of the input Layer of the model that you have trained
  6. data_type: - Whether it is a Float 32 or 16 or 64 or Integer type of data that would be passed.
  7. format: FORMAT_NHWC (stands for Number of samples, Height, Width, Channel)
  8. dims: corresponds to the dimensions your models can accept
  9. output: - It's the same as input except without the need of format

Now that we are all set, let's take the final step.

Step 5: Run the Model Server

Now that we are all set, just run the following command in the Terminal of your choice.

Before you run the command, just make sure to replace the following parameters in the command -

  1. Full path to your local model registry
  2. Version that you have downloaded of Triton

The command below is for GPU based system and if you want the server to run inference on GPU.

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models

If you want to run inference on CPU based system, then run the following command -

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models

Step 6: Enjoy your Inference Server

Congratulations! If you were able to follow the steps mentioned above, then you should have a working inference server on your local system, that you can use to deploy your models on and run inference from by simple API calls.

I hope this article was helpful in teaching you the basics of Inference Server and Triton.

In our next part of this article, we will cover - How to get inference from these models in our server and some Bonus Features that you might find very handy.

STAY TUNED 😁

Tags

Vaibhav Satpathy

AI Enthusiast and Explorer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.