Custom Vision: Chapter 6 - Object Detection

Computer Vision Aug 4, 2021

In our previous posts we covered detailed implementations of End-to-End Image Classification using both Open Source and Cloud Solutions.

But then again Why just Stop there?

Even though it is true that for a lot of Computer Vision problems Image Classification is considered a fundamental requirement, it does not quite solve the real world problems directly.

We humans have highly evolved Cognitive senses. Over the years of Evolutions our Cognitive senses and Brain Computation has grown significantly.

Think about it in this manner, when we try to replicate what a Human can perform after millions of years of Evolution in a matter of Months - Does it actually seem practical?

Well different people have varying opinions on the above thought. But let me tell you this, with the rate at which technology is advancing, it does seem practical to actually bridge the gap of millions of years of Evolution to a couple of Months.

As a part of this series we will be taking a deeper dive into the realm of Computer Vision and tackling a little higher grade of problem statement than before - Object Detection.

What is Object Detection?

Well as the name suggests, its identifying the precise location or coordinates of a particular entity within a Frame.

Now the template in which you could read your outputs varies based on personal and framework's preference.

For example -

  1. Some Open source frameworks prefer to provide the bounding box coordinates of the respective entities in a normalised form. Thereby making the user perform post processing to identify the exact pixel coordinates.
  2. Whereas in some cases the frameworks provide the standard pixel locations along with their tags and additional metadata as a part of their prediction.
  3. Then again there are some users, who don't prefer to receive a bunch of numbers as prediction, so rather they process the model in such a manner that it returns an Image with Bounding Box embedded over it.

With that being said, the next obvious question is How would one do it? because it doesn't seem like a straight forward problem to solve.

How to Perform Object Detection?

The answer is not that simple. As a part of this series we will be taking you through an immersive experience of each and every step involved in the process of creating a neural model for object detection.

As a part of our introductory Chapter, we introduced the basic steps involved in developing an End-to-End Computer Vision Solution.

Custom Vision: Chapter 1
In our previous posts we learned What is Vision? and What can we do with Vision? But sometimes using somebody else’s work for our purpose doesn’t quite fit well and we end up redoing things.That is what Customisation is all about. There are many instances where we are doing
So what is different from before, Why do we need this Article at all?

There are certain additional nitty gritties involved which increase the complexity as the use case changes. Let's take a look at some of them -

  1. Bounding Box - As the Images can't be tagged at a wholistic level, we need to annotate the data at a very precise level, identifying the location of multiple objects within the frame, thereby increasing the Data Labelling time significantly.
  2. Neural Architecture - As the model is not of a simple Classification, one would require to build a Custom Loss Function, which would be able to compute the Categorical loss between the Predicted Label for the object VS the Ground Truth, in addition a Regression Loss Function is needed to predict the deviation of the Coordinates from Ground Reality.
  3. Compute - As the model starts becoming heavier and the Image pre-processing involved to build such a pipeline also increases significantly, the training and inferencing time increase at the same pace.


Nothing is Impossible

What we just experienced was just a Preface information to what's coming and to get an intuitive understanding of how things will be happening.

We will be covering every aspect, starting from Data Gathering to Model Training and Deployment.



Vaibhav Satpathy

AI Enthusiast and Explorer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.