NLP Chapter 2: Google Cloud Platform

NLP May 10, 2021

In the previous chapter we answered questions such as What is NLP? Where can we implement it? What encompasses NLP?

Today's article focuses more of getting our hands dirty with one of the leading Cloud provider GCP (Google Cloud Platform) with their state of the art Natural Language API.

Thanks to the years of research and extensive hours of training performed by researchers across the globe, that now it's not necessary for every developer to start from Ground ZERO. There are several pre-trained solutions available out of the box for AI Designers to explore and validate then for their custom requirements.

Before we dig in deeper into the HOW of it, let's first try and understand what are the various fundamental applications of NLP -

  1. Natural Entity Recognition (NER)
  2. Syntactic Information Extraction
  3. Text Classification

Now that we are aware of what the future is going to look like, let's get to it.

Step 0: Setup the Natural Language API

The basic requirement for setting up the API is procurement of a cloud account. As this is a very detailed procedure and out of scope of this article. You can follow this link to setup the API for your system.

Step 1: Install the necessary Packages

pip install google-cloud-language

Step 2: Open your code editor and get a cup of coffee

test_sentence = "Barack Obama was the president of USA and his contact number was 9003779537 in case of emergency"

Step 3: Test - Natural Entity Recognition (NER)

NER as the name suggests is aimed at identifying the various entities present within a chunk of statement. It could range from PERSON, LOCATION, ADDRESS or even EVENT. While using pre-trained models, they usually have some pre-configured entities, but not limited to them. There are provisions to train the models for custom entity recognition, which we will take a look at in our future posts.

Let's define a function to perform the necessary task for each of the applications from here on.

from google.cloud import language_v1

client = language_v1.LanguageServiceClient()

def sample_analyze_entities(text_content):
    type_ = language_v1.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"content": text_content, "type_": type_, "language": language}
    encoding_type = language_v1.EncodingType.UTF8

    response = client.analyze_entities(
        request={"document": document, "encoding_type": encoding_type}
    )
    all_entities = {}
    for entity in response.entities:
        text = entity.name
        entity_type = language_v1.Entity.Type(entity.type_).name
        # salience_score = entity.salience
        if entity_type not in all_entities:
            all_entities[entity_type] = [text]
        else:
            all_entities.get(entity_type).append(text)
    return all_entities

entities = sample_analyze_entities(text_content=test_sentence)

# OUTPUT
{
    "PERSON": ["Barack Obama"],
    "OTHER": ["contact number","case"],
    "LOCATION": ["USA"],
    "EVENT": ["emergency"],
    "PHONE_NUMBER": ["9003779537"],
    "NUMBER": ["9003779537"]
}

As you can see based on the sentence that was fed to the function, it was able to extract so many different varieties of Entities out of the statement - PERSON, OTHER, LOCATION, EVENT and even PHONE_NUMBER.

You can find the list of pre-defined entities offered by GCP on this LINK.

Step 4: Test - Syntactic Information Extraction

Every sentence when spoken is critically built based on rules of English. It has Grammar, Adjectives, Prepositions, Dependencies, Subject-Object-Verb Agreement and a lot more details put into it.

Many a times while building complex NLP solutions it becomes extremely necessary to get a grasp of Part of Speech (POS) tags linked to the words in the statement. For Natural Language Querying, it plays a very trivial role in understanding the dependencies and context of the words within the statement.

def sample_analyze_syntax(text_content):
    type_ = language_v1.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"content": text_content, "type_": type_, "language": language}
    encoding_type = language_v1.EncodingType.UTF8

    response = client.analyze_syntax(
        request={"document": document, "encoding_type": encoding_type}
    )
    
    all_tokens = {}
    # Loop through tokens returned from the API
    for token_num, token in enumerate(response.tokens):
        text = token.text
        part_of_speech = token.part_of_speech
        dependency_edge = token.dependency_edge
        lemmatzed_word = token.lemma

        all_tokens[token_num] = {
            "token_text": text.content,
            "pos_tag": language_v1.PartOfSpeech.Tag(part_of_speech.tag).name,
            "person": language_v1.PartOfSpeech.Person(part_of_speech.person).name,
            "lemma": lemmatzed_word,
            "head_token": dependency_edge.head_token_index,
            "token_number": token_num,
        }
     return all_tokens
     
sample_analyze_syntax(text_content=test_sentence)

On most occasions the output from these functions are so wealthy in terms of the information they posses, it becomes difficult to gauge out the complete scope of possibilities out of same. To get an easier understanding of what Syntactic Information looks like, let's take a look at the image below -

As you can clearly see the output is rich in information and the number of use cases that can be designed out of such data is innumerable.

Step 5: Test - Text Classification

One of the most common implementation of NLP is to tag a particular category to a statement. It could be to analyse User Sentiments over Twitter feed, or it could be to understand Customer Feedback, or even to understand SPAM mails.

Text classification works on the basic principle of understanding the context of the statement. It tries to learn the frequency of the words used, the sentence structure, the user sentiment behind it and a lot more. Based on all these learnings the system categorises the chunk of information to a particular tag.

Just as all the above examples even this can be trained for custom tags, but if using a pre-trained model, it comes with certain pre-configured values to play around with.

def sample_classify_text(text_content):
    type_ = language_v1.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"content": text_content, "type_": type_, "language": language}

    response = client.classify_text(request={"document": document})
    
    final_response = {}
    for category in response.categories:
        # Get the name of the category representing the document.
        # See the predefined taxonomy of categories:
        # https://cloud.google.com/natural-language/docs/categories
        sent_response = {
            "Category name": category.name,
            "Confidence": category.confidence,
        }

        final_response[len(final_response) + 1] = sent_response
    return final_response
    
test_sentence = "I want to apply for Arts and Crafts in my school."
categories = sample_classify_text(text_content=test_sentence)

# OUTPUT
{ "Category name": "Arts & Entertainment",
  "Confidence": 0.98 }

There are certain pre-defined taxonomy if using GCP Natural Language API.

Congratulations, I hope you were successfully able to run your code and get the desired results. If you want to explore more and understand the full scope of services provided by GCP Natural Language API, you can test for yourself with an interactive UI right here on Natural Language API Demo.

Conclusion

Now we are aware of the various State of the Art NLP services offered on Google Cloud Platform to be leveraged for building ground breaking solutions. After understanding the fundamentals of NLP and the various components involved with the same, now it is upto us to decide how far can we push our IMAGINATION.

I hope this article helps you out in exploring the world of NLP on Cloud Platforms. For more content over implementation of NLP using Open Sourced tools, STAY TUNED. 😁

Tags

Vaibhav Satpathy

AI Enthusiast and Explorer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.