Company · 10 min read

Beyond CPUs: api.video’s tech tale of switching from CPUs to ASICs

Learn how api.video made a strategic pivot from CPU to ASIC-based VPU transcoding to manage high volumes of video catalogues.

Emilien Gobillot

January 2, 2024

Api.video started with a mission to empower product builders to effortlessly integrate video into their applications or services. To achieve this, we built and provided a global video backbone with an easy-to-use API for video delivery, transcoding, and live streaming. With this API, our customers are free to upload their videos at will and count on us for them to be quickly available for delivery. However, this becomes a formidable challenge when customers inundate us with their video catalogues at once.

 

In tackling this challenge in the past, we initially relied on the most common solution: CPUs. But the limitation with CPUs lies in their inefficiency for video transcoding, particularly with the advent of new video codecs.

 

We had to overcome this challenge and so we started looking at other alternatives: GPUs, FPGA and ASICs.

 

FPGA vs. GPU vs. ASICS: A comparison

When we started with our research on FPGA, GPU and ASICs, we gathered information from technical marketing presentations on some products (i.e Intel flex series cards not yet available on the market at that time) and ran benchmarks in our lab. Mixing real and marketing measures could have been misleading. The Codensity G4 and G5 chips from Netint turned out to be clear winners by an order of magnitude when it came to power efficiency and performance per dollar.

Graph comparing power efficiency and performance per dollar of transcoding solution

So, we decided to continue the theoretical study with real benchmarks with NETINT VPUs (video processing units) and transcoders, to validate this decision.

 

The journey of switching from CPU to ASICs

The decision to choose ASICs for our video infrastructure came after rigorous research.

 

Our journey began with the conventional use of CPUs. But as the demand for enhanced performance grew, our team delved into an extensive study, exploring and comparing transcoding solutions like FPGA, GPU, and ASICs.

 

Recognizing the need for a solution that aligned with our goals, we benchmarked the chips from Netint as their product was the clear winner of our study. The G4 performances were satisfying. We could have gone into production with the T408 cards based on G4, but NETINT’s Quadra G5-based transcoders were even more appealing with its AV1 support and higher density. The AI chips associated with the G5 on the T1U cards were also giving it some bonus points.

 

We then restarted our benchmarks with the G5-based Quadra T1U cards based on the G5 chips to assess their performance. We also tested different transcoding strategies to get a high transcoding speed and a good density per server. The results matched our expectations and we successfully finalised these benchmarks with the best transcoding strategy adapted to our transcoding video pipeline.

 

This decision marked a turning point in our infrastructure, opening doors to greater efficiency. With increased capacity and advanced encoding capabilities, we went into production.

 

How api.video reduced encoding costs with Quadra VPUs

api.video used the T1U cards based on the G5 chips built by Netint Technologies. These cards are called VPUs (Video Processing Unit). Let's understand the costs that come with them when we consider a 1080p30 video. To simplify the case, I will assume the source video codec is h.264. The output videos after transcoding will be five h.264 renditions for ABR: 1080p30, 720p30, 480p30, 360p30, 240p30.

 

For a real time transcoding, let's try to sum up the number of MP/s (Megapixel per second) it represents. For the above output, it represents 62 MP/s + 28 MP/s + 12MP/s + 7MP/s + 3MP/s respectively for 1080p30, 720p30, 480p30, 360p30, 240p30 renditions.

 

We need an encoding capacity of 112 MP/s for real time transcoding. Let us assume that the transcoding cost on VPUs is proportional to the number of pixels per frame (whatever the frame resolution) for the sake of simplicity and one VPU of Netint is able to encode 2000 MP/s in h.264. In reality it is a bit more complex but let’s take these as the numbers for a rough idea.

 

Now, as per the above assumptions, we can encode 2000/112 = 17 minutes of videos (all rendition included) per minute. Below is a table summarizing the minute cost for a server with 4 VPUs. The cost of the server is $10k and the amortization for such a server is 3 years.

 

The electricity cost will be approx. $1000 per year (420w at $0.30 / kWh) at full consumption. Let us see what is the cost of 1 minute of a 1080p30 video transcoded to 5 renditions.

 

(The cost is split taking into consideration the average usage of the server per day)

AVG. USAGE OF THE SERVER IN PERCENTAGEMINUTES ENCODED OVER THE AMORTIZATION PERIODCOST PER MINUTE TRANSCODED (MINUTES / COST OF SERVER + ELECTRICITY) IN $
0,50%5361120,020
5%53611200,002
25%268056000,0004
50%536112000,0002
90%965001600,0001

Transcoding cost per minute based upon varying usage percentages

If we use the server at only 0.5% per day (i.e ~7 min per day fully loaded) the cost per minute would be at $0,020.

 

If on an average, this server is loaded at 25% (6 hours a day fully loaded) the cost per minute would be 1/100 of our previous price.

So, even at 5% utilization, the cost per video is immaterial, allowing us to make the strategic decision to offer transcoding services at no charge to our customers.

 

What about video quality?

In the past, hardware encoders were associated with lower-quality video, but tests performed by us prove that this isn’t the case with the Quadra VPU. Specifically, we encoded a 1080p30 version of the movie Caminandes 2 via our API and compared output quality with a video encoded with FFmpeg and the x264 codec using the following command string.

 

ffmpeg -r 30 -y -i input.mp4 -c:v libx264 -force_key_frames 'expr:gte(t,n_forced*4)' -b:v 4400k -preset medium -c:a copy output.mp4

 

As shown in the table below, we compared the two videos using well-known quality metrics VMAF, PSNR, and SSIM. The scores are all extremely close, with the VPU-encoded video edging FFmpeg in VMAF scoring and the reverse for PSNR and SSIM. Given the closeness of the scores, however, it’s clear that no viewer would notice the difference.

VMAF (AVG)PSNR (AVG)SSIM (AVG)
api.video (VPUs)95.9248.070.9888
ffmpeg (CPUs)95.3348.690.990

Transcoding cost per minute based upon varying usage percentages

To summarize, api.video builds its own infrastructure so that we can take advantage of the right technology to offer the best to our customers. With these advanced VPUs in our infrastructure, we are easily able to reduce our transcoding TCO. We can pack a lot of VPUs in a single server which increases transcoding capacity and reduces the power consumption and the number of servers by an order of magnitude.

 

In a nutshell, VPUs help us encode videos at scale without costing us too much. These VPUs are the reason why api.video was able to make the decision to offer video encoding for free to our customers.

 

To start using api.video for your videos, sign-up for a free sandbox account now.

Try out more than 80 features for free

Access all the features for as long as you need.
No commitment or credit card required

Video API, simplified

Fully customizable API to manage everything video. From encoding to delivery, in minutes.

Built for Speed

The fastest video encoding platform. Serve your users globally with 140+ points of presence. 

Let end-users upload videos

Finally, an API that allows your end-users to upload videos and start live streams in a few clicks.

Affordable

Volume discounts and usage-based pricing to ensure you don’t exceed your budget.