NLP Chapter 4: AWS Comprehend
In our previous posts we explored the some of the various platforms that can be leveraged to perform Natural Language Processing.
Today's article is about one of those leading Cloud Platforms - Amazon Web Services.
Instead of spending too much time walking you through again What is NLP? Instead how about you take a brief look at NLP at the article mentioned below.

Now without wasting much of our precious time, how about we take a look at the various components of NLP that we will be covering in today's Chapter -
- Natural Entity Recognition (NER)
- Key Phrase Detection
- Personal Identification Information (PII) Detection
- Sentiment Analysis
- Syntactic Information Detection
In case if you are not versed with the above terms, need not worry as we will diving deeper into each one of them through the course of this article.
Step 0: Setup your Account
If you already have an account and have setup your work space, feel free to skip this step. For all the first timers please follow the link below to setup your account.

Step 1: Install the Client Library
For this article we will be using Boto3, the Python SDK offered by AWS for developers.
For detailed documentation of the same, you can find it right here.
pip3 install boto3
Step 2: Import the necessary Packages
import boto3
client = boto3.client("comprehend")
Step 3: Open your Code editor and get a cup of Coffee
text = "It is raining today in Seattle"

Test 1: Natural Entity Recognition
As the name suggests it is meant to identify the entities within a particular sentence. So what are entities?
Entities are tags provided to a particular word within a sentence based on general classification and grammar of the respective Language.
What are the various entities those are covered?
Well AWS covers a plethora of entity types when it comes to NLP. Some of the entities under their Comprehend Service are -
Type | Description |
---|---|
COMMERCIAL_ITEM | A branded product |
DATE | A full date (for example, 11/25/2017), day (Tuesday), month (May), or time (8:30 a.m.) |
EVENT | An event, such as a festival, concert, election, etc. |
LOCATION | A specific location, such as a country, city, lake, building, etc. |
ORGANIZATION | Large organizations, such as a government, company, religion, sports team, etc. |
OTHER | Entities that don't fit into any of the other entity categories |
PERSON | Individuals, groups of people, nicknames, fictional characters |
QUANTITY | A quantified amount, such as currency, percentages, numbers, bytes, etc. |
TITLE | An official name given to any creation or creative work, such as movies, books, songs, etc. |
def detect_entities(text: str):
response = client.detect_entities(Text=text, LanguageCode='en')
for entities in response.get("Entities"):
print(f"text: {entities.get("Text")}")
print(f"confidence_score: {entities.get("Score")}")
print(f"entity_type: {entities.get("Type")}")
return response
Test 2: Detect Key Phrases
A key phrase is a string containing a noun phrase that describes a particular thing. It generally consists of a noun and the modifiers that distinguish it.
Usually they are the words which are stressed upon in a statement to highlight certain features of either the Subject, Object or the Verb involved in the sentence.
def detect_entities(text: str):
response = client.detect_key_phrases(Text=text, LanguageCode='en')
for phrases in response.get("KeyPhrases"):
print(f"text: {phrases.get("Text")}")
print(f"confidence_score: {phrases.get("Score")}")
return response

Test 3: Detect Personal Identification Information (PII)
PII information play a significant role when you. are dealing with Enterprise grade Solutions.
Security of an Individual's Personal Information is critical, as they are vital to the privacy of an individual and also if exposed can lead to over exploitation of the same by a Hacker.
AWS covers a large set of entities under its vocabulary of PII banner. Some of the Entities detected by Comprehend are -
PII entity type | Description |
---|---|
ADDRESS | A physical address, such as "100 Main Street, Anytown, USA" or "Suite #12, Building 123". An address can include a street, building, location, city, state, country, county, zip, precinct, neighborhood, and more. |
AGE | An individual's age, including the quantity and unit of time. For example, in the phrase "I am 40 years old," Amazon Comprehend recognizes "40 years" as an age. |
AWS_ACCESS_KEY | A unique identifier that's associated with a secret access key; the access key ID and secret access key are used together to sign programmatic AWS requests cryptographically. |
AWS_SECRET_KEY | A unique identifier that's associated with an access key; the access key ID and secret access key are used together to sign programmatic AWS requests cryptographically. |
BANK_ACCOUNT_NUMBER | A US bank account number. These are typically between 10 - 12 digits long, but Amazon Comprehend also recognizes bank account numbers when only the last 4 digits are present. |
BANK_ROUTING | A US bank account routing number. These are typically 9 digits long, but Amazon Comprehend also recognizes routing numbers when only the last 4 digits are present. |
CREDIT_DEBIT_CVV | A 3-digit card verification code (CVV) that is present on VISA, MasterCard, and Discover credit and debit cards. In American Express credit or debit cards, it is a 4-digit numeric code. |
CREDIT_DEBIT_EXPIRY | The expiration date for a credit or debit card. This number is usually 4 digits long and formatted as month/year or MM/YY. For example, Amazon Comprehend can recognize expiration dates such as 01/21, 01/2021, and Jan 2021. |
CREDIT_DEBIT_NUMBER | The number for a credit or debit card. These numbers can vary from 13 to 16 digits in length, but Amazon Comprehend also recognizes credit or debit card numbers when only the last 4 digits are present. |
DATE_TIME | A date can include a year, month, day, day of week, or time of day. For example, Amazon Comprehend recognizes "January 19, 2020" or "11 am" as dates. Amazon Comprehend will recognize partial dates, date ranges, and date intervals. It will also recognize decades, such as "the 1990s". |
DRIVER_ID | The number assigned to a driver's license, which is an official document permitting an individual to operate one or more motorized vehicles on a public road. A driver's license number consists of alphanumeric characters. |
EMAIL | An email address, such as marymajor@email.com. |
IP_ADDRESS | An IPv4 address, such as 198.51.100.0. |
MAC_ADDRESS | A media access control (MAC) address is a unique identifier assigned to a network interface controller (NIC). |
NAME | An individual's name. This entity type does not include titles, such as Mr., Mrs., Miss, or Dr. Amazon Comprehend does not apply this entity type to names that are part of organizations or addresses. For example, Amazon Comprehend recognizes the "John Doe Organization" as an organization, and it recognizes "Jane Doe Street" as an address. |
PASSPORT_NUMBER | A US passport number. Passport numbers range from 6 - 9 alphanumeric characters. |
PASSWORD | An alphanumeric string that is used as a password, such as "*very20special#pass*". |
PHONE | A phone number. This entity type also includes fax and pager numbers. |
PIN | A 4-digit personal identification number (PIN) that allows someone to access their bank account information. |
SSN | A Social Security Number (SSN) is a 9-digit number that is issued to US citizens, permanent residents, and temporary working residents. Amazon Comprehend also recognizes Social Security Numbers when only the last 4 digits are present. |
URL | A web address, such as www.example.com. |
USERNAME | A user name that identifies an account, such as a login name, screen name, nick name, or handle. |
def detect_pii_entities(text: str):
response = client.detect_pii_entities(Text=text, LanguageCode='en')
for entities in response.get("Entities"):
print(f"confidence_score: {entities.get("Score")}")
print(f"entity_type: {entities.get("Type")}")
return response
Test 4: Sentiment Analysis
Sentiment Analysis as the name suggests, it id to gauge out the mindset of the person speaking the sentence.
It is categorised at a high level into 4 categories -
- POSITIVE
- NEGATIVE
- MIXED
- NEUTRAL
def detect_sentiment(text: str):
response = client.detect_sentiment(Text=text, LanguageCode='en')
print(f"sentiment: {response.get("Sentiment")}")
print(f"sentiment score: {response.get("SentimentScore")}")
return response
Test 5: Detect Syntax
Every sentence when spoken is critically built based on rules of English. It has Grammar, Adjectives, Prepositions, Dependencies, Subject-Object-Verb Agreement and a lot more details put into it.
Many a times while building complex NLP solutions it becomes extremely necessary to get a grasp of Part of Speech (POS) tags linked to the words in the statement. For Natural Language Querying, it plays a very trivial role in understanding the dependencies and context of the words within the statement.
def detect_syntax(text: str):
response = client.detect_syntax(Text=text, LanguageCode='en')
for tokens in response.get("SyntaxTokens"):
print(f"TokenId: {tokens.get("TokenId")}")
print(f"Text: {tokens.get("Text")}")
print(f"PartOfSpeech: {tokens.get("PartOfSpeech")}")
print(f"PartOfSpeechTag: {tokens.get("PartOfSpeech").get("Tag")}")
print(f"PartOfSpeechScore: {tokens.get("PartOfSpeech").get("Score")}")
return response

Every platform has its own set of schema set for the kind of Syntactic Information it wants to provide. For AWS -
Token | Part of speech |
---|---|
ADJ | Adjective Words that typically modify nouns. |
ADP | Adposition The head of a prepositional or postpositional phrase. |
ADV | Adverb Words that typically modify verbs. They may also modify adjectives and other adverbs. |
AUX | Auxiliary Function words that accompanies the verb of a verb phrase. |
CCONJ | Coordinating conjunction Words that links words or phrases without subordinating one to the other. |
DET | Determiner Articles and other words that specify a particular noun phrase. |
INTJ | Interjection Words used as an exclamation or part of an exclamation. |
NOUN | Noun Words that specify a person, place, thing, animal, or idea. |
NUM | Numeral |
Conclusion
Congratulations! If you have followed the steps mentioned above, you have successfully covered most of the critical use cases that you may ever face while working with Natural Language Processing.
Well All thanks to AWS Comprehend.
In case if the curious mind within you is not satisfied and you wish to explore more. Detailed documentation for the same can be found below.
I hope this article finds you well. For advanced content on Natural Language Processing STAY TUNED 😁.