Custom NLP: Chapter 4 - Entity Recognition using AutoML

NLP May 28, 2021

In our previous article, we talked about AutoML service provided by Google. We saw how can we set up Google Cloud Storage and treat it as a source in AutoML platform. We also saw how to choose from various models provided as per our problem statement and how to import data into the platform.

The goal of AutoML is to enable Domain experts who are unfamiliar with machine learning technologies to use ML techniques with ease. To accomplish that, AutoML is built on an Automated Pipeline that consist of the macro-steps mentioned below:

Macro-steps for AutoML Pipeline

In today's article, we will continue from where we left off. We will take the same example of Named Entity Recognition. Let's take a look at the following steps:

Annotate the uploaded text data on GCP AutoML platform

In the previous article, we saw how to select the type of model to be trained and how to import data for the NER extraction problem. Importing data is a long-running process and it will take some time. Once the data is loaded, we can continue with the next step that is DATA ANNOTATION.

You either upload pre-annotated data in JSONL format, or you can do annotation on GCP from scratch. In our example, we have uploaded data without any annotation information. Let's annotate the text document manually on GCP.

Data Annotation process
Annotated text

Once you have annotated all the text values imported, your screen should look something like this:

Validate data before we start training

Once the annotation process is completed, we can now visualize the label distribution and check if there are enough samples available for each labeled class. This can be done by clicking on "View Label Stats".

Looking at the label statistics we can say that there is a need to upload more test data examples.

Start training

Validate the data and after we are satisfied with the data, it's time to start training. This is the most complex and exciting part of the Machine Learning process.

Google AutoML takes away the complexity of it. We can start the training process with just a button click.
Start Training
We have added test data to meet the minimum data required for model training.

AutoML searches and implements various Machine Learning models and compares their performance to choose the one with the best results. The training process may take a few hours to complete.

Evaluate trained model

After training is completed, we can navigate to the model evaluation page and check the Performance Matrix of the trained model. Based on the evaluation parameters like Accuracy, Precision, Recall, and F-score we can decide whether the trained model is good enough for our problem statement, or we need to gather more data.

Conclusion

AutoML takes away the major complexity from a Machine Learning perspective. You need not worry about which Model architecture to choose, perform hyper-parameter tuning, or any other mathematical/statistical analysis of Machine Learning.

This bridges the gap between functional and technical expertise and helps non-technical personnel to build, test and deploy production-grade Machine Learning models with just a few button clicks.

Congratulations!!! You have trained your first NER model using AutoML in less than 5 mins. STAY TUNED for more exciting content.😁

Tags

Arpit Jain

Machine Learning Engineer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.