Vision: Chapter 3 - Google Vision

Computer Vision Jul 9, 2021

In the previous chapters we explored What is Vision? and How we can leverage Open Source tools such as OpenCV to perform certain actions on Images.

In today's article we are going to take a look at some of the Variety of Computer Vision use cases that we can perform on Images straight out of the bat using Native services of one of the leading Cloud Platforms in AI - Google Cloud Platform (GCP).

Before we jump into it, let's try and understand what are the various use cases that can come up for an AI Designer dealing with Computer Vision -

Image Classification - Many a times for setting up organisational products, it is essential to be able to understand the difference between explicit content (NSFW) vs Safe for Work (SFW) content. These kind of categorisation come under Image Classification.
Object Detection - Most firms and organisations deal with multiple vendors and clients at the same time, it becomes critical that they are able to identify and organise their documents based on which vendor does it belong to. Under such circumstances leveraging Object Detection to identify the Logo or TradeMark of a company proves to be extremely beneficial.
Image Segmentation - Say you are setting up a Security system and need to identify Artilleries from X-Rays and map the contour to highlight and cross-verify. We use Masking Neural Networks to map those contours and identify and tag the objects. This is called Image segmentation.

Now that you all have a rough understanding of the broad categorisation of problem statements are possible under Computer Vision. How about we go ahead and explore a couple of them.

Let's follow the steps below to setup the necessary environment to run our tests.

Step 0: Setup Vision API

The basic requirement for setting up the API is procurement of a cloud account. As this is a very detailed procedure and out of scope of this article. You can follow this link to setup the API for your system.

Step 1: Install the necessary Packages

pip3 install google-cloud-vision

Step 2: Open your Code Editor and get yourself a cup of Coffee

Step 3: Import the necessary Packages

from google.cloud import vision
import io

Step 4: Define a Client for Vision API

client = vision.ImageAnnotatorClient()

Step 5: Define a function to read Images

def read_image_file(path: str):
    with io.open(path, "rb") as image_file:
        content = image_file.read()
    image = vision.Image(content=content)
    return image

Test 1 - Face Detection

def detect_faces(path: str):
    image = read_image_file(path=path)
    response = client.face_detection(image=image)
    faces = response.face_annotations

    # Names of likelihood from google.cloud.vision.enums
    likelihood_name = (
        "UNKNOWN",
        "VERY_UNLIKELY",
        "UNLIKELY",
        "POSSIBLE",
        "LIKELY",
        "VERY_LIKELY",
    )
    print("Faces:")

    for face in faces:
        print("anger: {}".format(likelihood_name[face.anger_likelihood]))
        print("joy: {}".format(likelihood_name[face.joy_likelihood]))
        print("surprise: {}".format(likelihood_name[face.surprise_likelihood]))

        vertices = [
            "({},{})".format(vertex.x, vertex.y)
            for vertex in face.bounding_poly.vertices
        ]

        print("face bounds: {}".format(",".join(vertices)))

Test 2 - Label Detection

def detect_labels(path: str):
    image = read_image_file(path=path)

    response = client.label_detection(image=image)
    labels = response.label_annotations
    print("Labels:")

    for label in labels:
        print(label.description)

Test 3 - Landmark Detection

def detect_landmarks(path):
    image = read_image_file(path=path)
    response = client.landmark_detection(image=image)
    landmarks = response.landmark_annotations
    print("Landmarks:")

    for landmark in landmarks:
        print(landmark.description)
        for location in landmark.locations:
            lat_lng = location.lat_lng
            print("Latitude {}".format(lat_lng.latitude))
            print("Longitude {}".format(lat_lng.longitude))

Let's take another sip of our deliciously brewed coffee and then resume.

Test 4 - Logo Detection

def detect_logos(path):
    image = read_image_file(path=path)
    response = client.logo_detection(image=image)
    logos = response.logo_annotations
    print("Logos:")

    for logo in logos:
        print(logo.description)

Test 5 - Multiple Object Detection

def localize_objects(path):
    # Localize objects in the local image.
    image = read_image_file(path=path)
    objects = client.object_localization(image=image).localized_object_annotations

    print("Number of objects found: {}".format(len(objects)))
    for object_ in objects:
        print("\n{} (confidence: {})".format(object_.name, object_.score))
        print("Normalized bounding polygon vertices: ")
        for vertex in object_.bounding_poly.normalized_vertices:
            print(" - ({}, {})".format(vertex.x, vertex.y))

Test 6 - Explicit Content Detection

def detect_safe_search(path):
    image = read_image_file(path=path)
    response = client.safe_search_detection(image=image)
    safe = response.safe_search_annotation

    # Names of likelihood from google.cloud.vision.enums
    likelihood_name = (
        "UNKNOWN",
        "VERY_UNLIKELY",
        "UNLIKELY",
        "POSSIBLE",
        "LIKELY",
        "VERY_LIKELY",
    )
    print("Safe search:")

    print("adult: {}".format(likelihood_name[safe.adult]))
    print("medical: {}".format(likelihood_name[safe.medical]))
    print("spoofed: {}".format(likelihood_name[safe.spoof]))
    print("violence: {}".format(likelihood_name[safe.violence]))
    print("racy: {}".format(likelihood_name[safe.racy]))

If you have followed all the steps above in declaring the functions in a single Python file. Then all you are left with is to call any one of the functions and pass the Image path and enjoy your results.

You can find detailed taxonomy and response structures as well as additional Features which you may find befitting your use case right on the link below.

Package google.cloud.vision.v1 | Cloud Vision API | Google Cloud

As a bonus for all the readers, if you want to explore Google Cloud Vision click HERE!

Conclusion

As per our discussion by far, we have realised that sometimes leveraging Cloud Native services can provide State of the Art performance for even niche problem statements with much less hassles.

I hope this article find you well. The variety of use cases and possibilities of implementation is only limited by your imagination.

Keep Exploring and Keep Tinkering. STAY TUNED for more content. 😁

Tags

Vaibhav Satpathy

AI Enthusiast and Explorer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.