MLOps Building Blocks: Chapter 7 - Local Serving with FastAPI - Text Classification

MLOps Oct 15, 2021

In our previous articles we explored How to Serve Image Classification type of models Locally.

But well as we all are aware, there are multiple types of models and data types when it comes to interacting with the world.

So today's article is about Serving a Custom Text Classification model Locally and running Inference over it.

So what do we need?

As always to run inference, you would need a training script. Now instead of re-inventing the wheel, let's follow one of our previous Blogs on the same content, where we create a Custom Text Classification model.

Custom NLP: Chapter 5 - Text Classification - Tensorflow
In our previous posts we explored What is Custom NLP and a sample implementation of Custom NLP for Entity Recognition using AutoML. But sometimes Cloud Native may not be the preferred solution or there might be some of us aspiring to be a Deep Learning Engineer and would prefer building

Now that we are all set with our model, let's evaluate our Status Quo first

Things we can Upgrade

  1. In general, Models are usually stored on a Cloud Storage for easier access.
  2. In some cases the models are even stored within the Docker Application that has been built to be used by End User.
  3. There are many tools available in the Open Source community that help serve models at production level, but the learning curve is usually very High.
  4. It is very likely that your single application could be serving multiple models with different URL End points.

Our current situation

  1. We are running our application on our Local system, hence the models are also stored and accessed by the application from our system.
  2. As we want to go Step by Step, so that things don't feel overwhelming as people usually portray them to be, so we decided to use FastAPI and Local System.
  3. As this is just a Test Script to help one understand intuitively, we have created only one end point for one model.

Now I think we are good to begin. Let's take a cup of Coffee and begin our Journey.

Step 0: Install all the necessary Packages

pip3 install tensorflow json numpy uvicorn fastapi

Step 1: Import all the Packages

import uvicorn
import numpy as np
import json

from fastapi import FastAPI
from tensorflow.keras.models import load_model

Step 2: Read the Model Artifacts and Labels

# Give local path to your models and labels
model_path = "<path_to_your_exported_model_directory>/models_v2"
labels_path = "<path_to_your_exported_labels>/labels.json"

# Read the labels
with open(labels_path, "r") as f:
    labels = json.load(f)

# Load your model and create its instance
model = load_model(model_path)

Step 3: Create a FastAPI Instance

# Declare a FastAPI instance
app = FastAPI()

Step 4: Create your API End Point

As we are creating an API End point that will accept Text as inputs, so we will be going ahead with defining certain query parameters on our API.

You can choose any extension that you might want to give to your API.

Below we have demonstrated a simple way of creating a POST API with text as input. If you want to explore further, you can take a look at the link mentioned below.

Query Parameters - FastAPI
FastAPI framework, high performance, easy to learn, fast to code, ready for production
# Create a POST type of router with a URL end point name of your choice
# Your POST request carries the text in its query"/test_model")
async def test_function(sample_text: str):

    # Read the Text and convert it into a numpy array
    test_input = np.asarray([sample_text])

    # Send your Text for Prediction and receive a numpy array of probabilities
    prediction = model.predict(test_input)

    # Read the corresponding label from your Labels JSON
    index = np.argmax(prediction)
    predicted_label = labels.get(str(index))

    # Return the predicted Label
    return {"Prediction": predicted_label}

# Run the script
if __name__ == "__main__":, host="", port=5000)

Step 5: Run your Script

Once you are all set, all that's left is running the code and opening your choice of browser and navigating to the URL = http://o.

2021-10-08 15:12:22.250025: I tensorflow/compiler/mlir/] None of the MLIR optimization passes are enabled (registered 2)
2021-10-08 15:12:22.250451: W tensorflow/core/platform/profile_utils/] Failed to get CPU frequency: 0 Hz
INFO:     Started server process [25912]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on (Press CTRL+C to quit)

Step 6: Enjoy Inferencing

Now all that's left is to Test our API end point. So let's go ahead and follow the steps as mentioned below.

Feel free to try out your API with various Inputs and validate if your model is performing as expected.


Congratulations! If you were able to follow the steps mentioned above, then you should have a working API End point on your local system.

If you want to take things to the next level, then follow one of the articles of our previous Chapters -

MLOps Building Blocks: Chapter 3 - Containerising with Docker
Bringing AI to You

Or if you might prefer going through the article and want to check out the code, you can view the link below -

AI-kosh/mlops/chp_7b at main ยท Chronicles-of-AI/AI-kosh
Archives of blogs on Chronicles of AI. Contribute to Chronicles-of-AI/AI-kosh development by creating an account on GitHub.

I hope this article finds you well. In our future articles we will be covering more implementations of AI models End to End for MLOps.



Vaibhav Satpathy

AI Enthusiast and Explorer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.