In our previous article, we introduced you to the latest tool in the ML domain provided by Google – Vertex AI. We also saw different use cases which can be solved using Vertex AI. We also saw how to create a dataset, link it to Google Cloud Storage, and import our files in the Vertex AI platform.
Vertex AI is a unified AI platform provided by Google that bridges the gap between developers and domain experts. It also provided end-to-end pipelines which are needed to make your AI solution production-ready
Now that we are done with all the tasks related to data ingestion and pre-processing, let's turn to the fun part and train a model on the dataset. Today we will see how can we train an AutoML model for the Image Classification dataset we uploaded in the last chapter.
We will be expanding on the Single Label Image Classification example that we took in the previous post. We recommend you go through that before proceeding further.
In the last article, we stopped after importing the dataset into the Vertex AI platform from our Google Cloud Storage bucket. Let's take it from there. We will divide this post into 3 parts.
Part 1: Training
We will perform all the actions using the GCP console. Just like we did in the last article, you need to navigate to the Vertex AI platform on the GCP console and locate the dataset we created earlier.
Once you have created the dataset and imported images, you can validate the imported data from the console. If there are any corrections or modifications to be made on the dataset, this is the time.
After validating the dataset, it's time to start the training process. From the console, training can be initiated by a simple button click – "TRAIN NEW MODEL" on the Dataset screen.
Step 1: Select Training Method
Before starting the training process, there is a choice to be made. Which Training method to be selected?
- AutoML – To train high-quality models with minimum efforts and ML expertise.
- AutoML Edge – To train models that are optimized to be used on edge devices.
- Custom – Build your own Neural Network Architecture using TensorFlow, scikit-learn, etc., and train using the custom container.
To keep things simple, let's proceed with the AutoML training method. We will cover custom model training in upcoming articles.
Step 2: Split Dataset
Once the choice of training method is made and you click on "Continue", it's time to make another choice. This time we need to choose the Data Split.
- Randomly assigned – Data split is done randomly by the system. You can control the data split by modifying the Training, Validation, and Test percentage of the split. By default the split is 80:10:10, i.e. 80% of data is treated for training, 10% of data is treated for validation, and the remaining 10% of data is taken into account for testing.
- Manually assigned – In this mode, the system will make the data split using the assignments specified by the user at the time of data import.
Step 3: Control the Cost of Training
The final step is to decide how much would you like to spend on the training process. The cost of training is computed in "node hour" units. For each unit of time, GCP uses 8 nodes in parallel, where each node is equivalent to a standard 8-core machine with an attached GPU.
For more details on the pricing, click here to refer to official documentation by Google.
You can also select an option to "Enable Early Stopping" before starting the training process.
Early Stopping is a process in model training, which stops the model training if there is no significant improvement made on the model parameters for a certain period of time.
Step 4: Wait for Training to Complete
Once you put in your budget for training, the "Start Training" button will be enabled. Simply click on the button and sit back for it to complete. You will receive a notification on your registered email-id once the training process is complete.
Part 2: Evaluation
After the training process is completed, you will receive a notification on your registered email-id. After training, it's time to evaluate the model and check if the training was any good or not.
Vertex AI provides a separate screen for model evaluation from Console. All the models trained can be seen in the "Models" tab on the Vertex AI platform. Identify the model you just trained from the list and drill down on the detail screen.
There is a lot of information provided on a single screen. From the evaluation screen, you can
- Validate the dataset and data split (Train, Validation, and Test) used for training.
- Identify the usefulness of the model using the Recall and Precision score.
- Analyze the model predictions.
For a more in-depth understanding of the various evaluation concepts, please refer to the below article.
Part 3: Deployment
After evaluating the training process and the model generated, if you are satisfied with the predictions, it's time to deploy the model and open it up for general use by other applications.
Navigate to the "Deploy & Test" tab and simply click on "Deploy to Endpoint" button to deploy the model to an endpoint.
Again, there are multiple choices that you have to make here,
- Provide the endpoint where you want your model to be accessible.
- Location of the endpoint (Example: us-central1).
- Number of compute nodes for inference task.
- Enable or disable logging for the endpoint.
With this, we conclude the training and deployment process using AutoML on Vertex AI. It's time to treat ourselves with a cup of coffee, while Vertex AI deploys the model on the endpoint.
All the steps mentioned above can be Automated or accomplished via using Python SDK as well. We will cover these automated steps in future articles.
STAY TUNED for more content following in our journey of Custom Computer Vision. 😁