The world has never been the same since the arrival of GPUs
And guess who brought that change = Nvidia
For all the folks out there who don't know Nvidia -
Nvidia Corporation is an American multinational technology company incorporated in Delaware and based in Santa Clara, California. It designs graphics processing units for the gaming and professional markets, as well as system on a chip units for the mobile computing and automotive market.
Or at least that's what they claim to be.
To me, they are the pioneers in today's world leading AI into a whole new level. That sounds great, But -
Why Do I Care?
Firstly with the introduction of GPUs, Gaming and AI were taken to another level altogether. With blazing fast processing speed, nothing seemed impossible.
In the past few years that was more than enough to awestruck anybody. But with the Democratisation of AI and the ambition to make it available to everybody, we need to pull up our Big boy socks.
In the last year or so, the new Buzzword - MLOps kind of took over the market.
Where people with no previous ML expertise could also Train and Deploy models and share it with their peers and organisations.
The biggest challenge that they all faced was that, although with the introduction of AutoML - training and deployment became seamless, but it limited the users to 3 major things -
- Limited Use Cases catered by AutoML
- No scope for deployment of Custom Models
- Significantly reduced the scope for innovation in Neural Architectures
Obviously that made a lot of developers unhappy. As innovation and tinkering around with technology was what developers are well known for.
But like always, this was just the silence before the storm.
Nvidia recently released a new Open Source framework called Triton. And that has since taken the complete AI Universe by storm.
What is Nvidia Triton?
NVIDIA Triton Inference Server simplifies the deployment of AI models at scale in production. Open-source inference serving software, it lets teams deploy trained AI models from any framework (TensorFlow, NVIDIA TensorRT, PyTorch, ONNX Runtime, or custom) from local storage or cloud platform on any GPU- or CPU-based infrastructure (cloud, data centre, or edge).
To know more, follow the link below -
Or that's like the longer definition of it. In simple terms -
Triton is dream come true for any AI developer to take things all the way from experimentation to production.
Why is it the next Revolution?
Of the many features of Triton, let's try and take a look at some of the key Highlights -
- Supports Multiple Frameworks - Triton Inference Server supports all major frameworks like TensorFlow, TensorRT, PyTorch, ONNX Runtime, and even custom framework backends.
- High performance Inference - Triton runs models concurrently on GPUs to maximise utilisation, supports CPU-based inferencing, and offers advanced features like model ensemble and streaming inferencing.
- Dynamic Scalability - As it is available as Docker Container and comes with out of the box integration with Kubernetes for Orchestration, metrics and Auto-scaling.
- Real time Model Updates - Triton can serve tens or hundreds of models through model control API. Models can be loaded and unloaded into and out of the inference server based on changes to fit in GPU or CPU memory without having to restart your server.
What you read above was just a peak into the vast expanse of features offered by Triton.
One of the very critical highlights also involves Monitoring Model performance. Triton exports Prometheus metrics for monitoring GPU utilisation, latency, memory usage, and inference throughput. Which can then be plugged into Grafana Dashboard for visualisation.
What you saw was a new era in the world of MLOps and it is our responsibility to STAY Ahead of the curve and keep tinkering.
I hope this article finds you well and if you liked it, in our future posts we will be covering How to Deploy Nvidia Triton and its' other features.
STAY TUNED 😁