Video Intelligence Chapter 2: GCP AutoML

Video Intelligence Sep 29, 2021

Welcome to a new chapter in our Video Intelligence mini-series. In this series, we are exploring different services available on the Google Cloud Platform to perform various tasks on the video clip we have.

In our previous post, we discussed in detail all the features provided by Video Intelligence API and how to use them programmatically.

Cloud Video Intelligence API comes with the pre-trained model on a large dataset. Hence they provide very high accuracy in all the use-cases as video classification, video object detection, etc.

But what if we want to train a model on our own custom labels?
AutoML Video Intelligence is an answer to that question.

AutoML Video Intelligence

This service provides us a graphical interface that makes it easy to train your custom models to classify and track objects within videos, even if you have a minimal machine learning experience.

It’s ideal for projects that require custom labels that aren’t covered by the pre-trained Video Intelligence API.


AutoML Video Intelligence service provides us majorly two features:

AutoML Video Classification

Using this we can train a custom model to classify shots and segments in our videos. The machine learning experience is not required to configure the service.

AutoML Video Object Tracking

Using this we can train a custom model using the API to detect and track objects in our videos with bounding boxes and labels. The model trained can be deployed to the cloud, and prediction results can be viewed in the Cloud Console.

In this article, we will dive deeper into the Video Classification feature.

Setup and Usage

To start using the service, there are some pre-requisites. To enable the API for your account, visit the AutoML Video UI and click Enable API.

Prepare Video Annotation Data

To start with the training process, we need to prepare our dataset in the format acceptable by AutoML. Refer to the below checklist to ensure there are no errors once you start with the training process.

  1. Ensure your video is in one of the supported formats (mov,mpeg4,mp4,avi) and file size is not more than 50GB.
  2. Ensure that the training data is as close as possible (in resolution) to the data on which predictions are to be made.
  3. Google recommended size of training data is 1000 videos per label. The minimum per label is 10.
  4. Upload all your videos to the GCS bucket along with the CSV file referring to the video files in the below format.
SEGMENT_START_TIME cannot be less than 0. SEGMENT_END_TIME can be passed as "inf" to refer end of the video.

5. Upload a master annotation CSV to the cloud storage that refers to the files we have created in the previous step.




Once we are ready with the annotation data, let's get started with creating a dataset on the GCP instance. Again, we can use the GCP console to perform all the steps or do that programmatically.

Option 1: Using GCP Console

Follow the steps mentioned below to train your model from the GCP console.

Quickstart: Using the console for video classification

Option 2: Using API

Navigating through the GCP console for every task can get a little annoying for some developers.

If you are one of them, then don't worry, we will discuss how to perform all the required steps via GCP AutoML APIs.

Step 1: Create Dataset

Create an AutoML dataset using the script shared below.

from import automl_v1beta1 as automl

# TODO: Update the variable values
project_id = "PROJECT_ID"
dataset_name = "DATASET_NAME"

client = automl.AutoMlClient()

"""Create a automl video classification dataset."""
project_location = f"projects/{project_id}/locations/us-central1"
metadata = automl.VideoClassificationDatasetMetadata()
dataset = automl.Dataset(
created_dataset = client.create_dataset(parent=project_location, dataset=dataset)

# Display the dataset information
print("Dataset name: {}".format(

# To get the dataset id, you have to parse it out of the `name` field.
print("Dataset id: {}".format("/")[-1]))

Step 2: Import Dataset

Using the dataset_id create in Step 1, import the data we want to the dataset record. Refer to the master annotation CSV that we created in earlier steps wherever necessary.

# TODO: Update the variable values
project_id = "PROJECT_ID"
annotation_csv_uri = "MASTER_ANNOTATION_CSV_URI"
"""Import a dataset."""
dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
# Get the multiple Google Cloud Storage URIs
input_uris = annotation_csv_uri.split(",")
gcs_source = automl.GcsSource(input_uris=input_uris)
input_config = automl.InputConfig(gcs_source=gcs_source)
# Import data from the input URI
response = client.import_data(name=dataset_full_id, input_config=input_config)
print(f"operation_id : {}")
# print("Processing import...")
# print("Data imported. {}".format(response.result()))

It's time to take a small break. Importing a dataset is a long-running operation that will take few minutes (depending on the data size) to complete.

Perfect time for a coffee break.

Step 3: Validate the imported Data

Validate the dataset imported in our AutoML project once Step 2 is complete. This can be done by navigating to the dataset via the GCP console.

GCP Video Intelligence Dataset screen

Step 4: Start Training

Finally, it's time to start training. Training can be started from the GCP console by navigating to the "TRAIN" tab or programmatically using the script below.

from import automl_v1beta1 as automl

# TODO: Update the variable values
project_id = "chronicles-of-ai"
dataset_id = "VCN404924293986648064"
display_name = "HumanActionsModel_v1"

client = automl.AutoMlClient()

"""Create a automl video classification model."""

project_location = f"projects/{project_id}/locations/us-central1"
metadata = automl.VideoClassificationModelMetadata()
model = automl.Model(

response = client.create_model(parent=project_location, model=model)

print("Training started...")
print(f"Training operation name: {}".format(

Training is a long-running process. It will be completed in few hours. Let's take a break and check the status of the process periodically.

Step 5: Evaluate and Test

It's time to evaluate the results and perform inference on the trained model. For this navigate to the cloud console and drill down on the trained model.


Congratulations!!! You have trained a video intelligence model on your custom dataset. We have catered for Video Classification use cases, but this can be extended to various other use cases as well.

Check out our GitHub repository for all the annotations, scripts and dataset used in this article.
AI-kosh/video_intelligence/video_automl at main · Chronicles-of-AI/AI-kosh
Archives of blogs on Chronicles of AI. Contribute to Chronicles-of-AI/AI-kosh development by creating an account on GitHub.

I future posts, we will see how to train the AutoML model to perform object detection use case. This will be useful in various practical scenarios.



Arpit Jain

Machine Learning Engineer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.