Video intelligence is one of the solutions that are most in demand nowadays. Performing object detection on an image is a good thing, but if we can perform that on a streaming video, that is a solution people will pay up for.
Video intelligence has a lot of applications, from domestic security to identifying product defects in industries. Have you ever used filters on video calls with your friends? That is the simplest example of video intelligence.
In today's post, we will take a step further and explore how can we generate our own AI engine that will give us real-time insights out of a video clip using MediaPipe.
MediaPipe is an open-source python framework developed by Google, which provides solutions for major Video Intelligence use-cases.
- Face detection
- Face-mesh detection
- Hand gesture detection
- Pose detection
Enough of theory, now let's create our own AI processors.
Step 1: Setup and Installation
Install MediaPipe and OpenCV dependency.
pip3 install mediapipe pip3 install opencv-python
Import the library in your python script.
import cv2 import mediapipe as mp
Step 2: Build your processor
Now that we have imported all the necessary libraries, let's prepare our pose-detection processor. We will be using the Computer Vision library for support.
import cv2 import mediapipe as mp capture = cv2.VideoCapture("path/to/source/video_file") draw = mp.solutions.drawing_utils mpPose = mp.solutions.pose pose = mpPose.Pose() frame_width = int(capture.get(3)) frame_height = int(capture.get(4)) size = (frame_width, frame_height) result = cv2.VideoWriter("path/to/target/video_file", cv2.VideoWriter_fourcc(*"MJPG"), 10, size) while True: success, img = capture.read() imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = pose.process(imgRGB) if results.pose_landmarks: # print(results.pose_landmarks.landmark) draw.draw_landmarks(img, results.pose_landmarks, mpPose.POSE_CONNECTIONS) cv2.imshow("Image", img) result.write(img) cv2.waitKey(1) capture.release() result.release() cv2.destroyAllWindows()
Let's try to understand what is done here.
- We captured frames from our input video file using the OpenCV video capture function.
- We instantiated the MediaPipe pose estimator class.
- Modified the default BGR color format of the image to RGB.
- Process the frame to get resulting landmarks for a human pose.
- Visualize the results and save them in target video format using OpenCV functions.
Step 3: Process your video file
Time to enjoy the results. Provide the video file path to the above script and sit back and relax. Give some time for the process to end and check out the output video file generated.
Impressed by the results?
Check out the video below which will give you an overview of all other features available with MediaPipe.
MediaPipe also provides the library for Android, IOS and JS. Which makes it very easy to use and lightweight. The applications of the library are explosive.
- Surveillance cameras and drones
- Understanding sign-language
- Attention monitoring
- Video games and many more...
This was a short introduction to an awesome open-source framework by Google called MediaPipe. This is not it, you can tweak parameters and check out the variations in the results.
Check out our GitHub repository where we have tackled Hand detection, face detection, and face-mesh detections as well.
Stay tuned for more awesome content. Stay healthy and stay curious. 🤠