Video Intelligence Chapter 1: GCP
Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media.
The first-ever video was shot in the late 1800, and ever since the technology is evolving rapidly.
But we are not interested in history, are we?
So, let's talk about the present and the future. Since the introduction of the internet, the accumulation of data has drastically increased.
With the high-speed internet connections available, and various social media platforms like YouTube, Instagram, Facebook, etc. drastic amount of video content is getting introduced to the world every second.
With the huge pile of video content, there is a rising need to organize the content.
But maintaining and organizing it manually won't be a smart option right?
Video Intelligence
Cloud Video Intelligence API provides state-of-the-art results on video analysis. GCP provides us majorly 2 ways to extract useful information from media.
Video Intelligence API
It provides pre-trained machine learning models that automatically recognize a vast number of objects, places, and actions in stored and streaming video.
Offering exceptional accuracy out-of-the-box, it’s highly efficient for common use cases and improves over time as new concepts are introduced.

AutoML Video Intelligence
But pre-trained models are not always useful, which brings us to AutoML Video Intelligence which provides a graphical interface that makes it easy to train your custom models to classify and track objects within videos, even if you have a minimal machine learning experience.
It’s ideal for projects that require custom labels that aren’t covered by the pre-trained Video Intelligence API.
Setup and Usage
To start with Video Intelligence APIs, we need to first set up our GCP account with the necessary permissions and installations. Let's start.
Step 1: Create a GCP account
In our previous articles, we have discussed how to set up a GCP account setup in detail. If you need help in setup up the account, we recommend you go through those articles before proceeding further.

Video intelligence API is available in free trial as well.
Step 2: Install necessary libraries
pip install --upgrade google-cloud-videointelligence
Step 3: Upload necessary files to Cloud Storage
To get an analysis of your video file, place your file to Google Cloud Storage and note the URI of the file, we would need that in future steps.

Step 4: Get Analysis
Run the provided python script to get the video analysis.
"""All Video Intelligence API features run on a video stored on GCS."""
from google.cloud import videointelligence
gcs_uri = "gs://PATH_TO_VIDEO_FILE"
output_uri = "gs://PATH_TO_OUTPUT_JSON_FILE.json"
video_client = ( videointelligence.VideoIntelligenceServiceClient.from_service_account_file("PATH_TO_CREDENTIAL_FILE.json"))
# Getting results from all the features available in Video Intelligence API
features = [
videointelligence.Feature.OBJECT_TRACKING,
videointelligence.Feature.LABEL_DETECTION,
videointelligence.Feature.SHOT_CHANGE_DETECTION,
videointelligence.Feature.SPEECH_TRANSCRIPTION,
videointelligence.Feature.LOGO_RECOGNITION,
videointelligence.Feature.EXPLICIT_CONTENT_DETECTION,
videointelligence.Feature.TEXT_DETECTION,
videointelligence.Feature.FACE_DETECTION,
videointelligence.Feature.PERSON_DETECTION,
]
# Transcription configurations
transcript_config = videointelligence.SpeechTranscriptionConfig(
language_code="en-US", enable_automatic_punctuation=True
)
# Person Detection configurations
person_config = videointelligence.PersonDetectionConfig(
include_bounding_boxes=True,
include_attributes=False,
include_pose_landmarks=True,
)
# Face Detection configurations
face_config = videointelligence.FaceDetectionConfig(
include_bounding_boxes=True, include_attributes=True
)
# Transcription configurations
video_context = videointelligence.VideoContext(
speech_transcription_config=transcript_config,
person_detection_config=person_config,
face_detection_config=face_config,
)
operation = video_client.annotate_video(
request={
"features": features,
"input_uri": gcs_uri,
"output_uri": output_uri,
"video_context": video_context,
}
)
# Getting long running operation id
print("\nProcessing video.", operation)
print(f"\nOperation Id: {operation.operation.name}")
result = operation.result(timeout=300)
print("\n finished processing.")
Let's understand what the above script is doing.
- Provided GCS URI for the video file and path where we want our output to be saved.
- Select the features we want to extract from the video.
- Provide additional parameters required for feature extraction.
- Send a request to extract features, sit back and relax.
That was fairly simple, right?
Step 5: Time to visualize the results we got
Download the output JSON from the Cloud storage path mentioned above. Upload your video and the output JSON on the link below to visualize the results.

See the sample video processed video from F.R.I.E.N.D.S. where Joey was trying to buy a birthday gift for his girlfriend. Enjoy!
Isn't that cool?
Conclusion
As we have seen, GCP Video Intelligence APIs are loaded with features and can be used as-is to solve various use-cases. Scripts mentioned in the article and many more are added to the Github repository shared below.
But that is not enough, is it?
Even though it solves major use-cases, but still there is a scope to improve the results and make it more personalized for our problem.
Don't worry, GCP has got us coved in that front too.
AutoML Video Intelligence can be used to train and customize the video intelligence results as per our needs. We will cover that in our future posts.
Till then keep learning and stay tuned. :)