Tutorials · 3 min read

video with captions

Auto Caption a video

Videos with captions/subtitles are watched longer than videos without. Yet, adding captions is non-trivial. In this post, we'll walk through a sample application that takes an uploaded video, and automatically transcribes the audio and creates a caption file for the video.

Doug Sillars

February 26, 2021

Video captions/subtitles make it easier to follow along with your video. Whether your users have hearing issues or not, sometimes they may not want to have a loud speaker playing the video's audio, and adding captions allows them to understand the video.

video with captions

When adding captions to a video, you must create a WebVTT file. We've written on the process of creating WebVTT files. The format is not difficult to understand, but they are not simple to create manually, you need to break up a transcription into segments, and add start and stop times for each section.

00:00:07.519 --> 00:00:11.759
The little train rumbled over the tracks.

00:00:10.880 --> 00:00:15.279
She was a happy little train for She had such

00:00:13.439 --> 00:00:16.719
a jolly load to carry.

00:00:15.599 --> 00:00:20.480
Her cars were filled full of good things for boys

00:00:18.879 --> 00:00:21.199
and girls.

In this post, we'll show how using the Authot transcription APIs can help you automatically add video captions to your videos, without the fuss of manually creating the VTT caption file.

Authot will receive the video file, make a transcription of the audio file, and then return a complete webVTT file. api.video can then ingest this file into the same videoID as the video, and the captions are added to the video.

It sounds like magic! To prove that it actually works, we've built a demo application caption.a.video that utilizes api.video and Authot to host, stream and auto-caption your videos.

How it works

The first step is to upload your video to api.video. This demo is built on top of a delegated token upload, meaning that a temporary public token was created that lets us upload right from the browser.

Upload complete

Once the video is uploaded, api.video begins transcoding into a HLS sstream and an MP4 version of the movie. To create a transcription, Authot requires the mp4 file, so the application uses the video status endpoint to determine when the mp4 video has been encoded. This endpoint tells us when each video stream is ready for playback. Once the mp4 is encoded, we can submit the file to Authot for transcription.


Once the video has been submitted to Authot, the API returns the ID number of the transcription. The Status endpoint tells us what state the transcription is in.

The initial state is 0: Uploading

The final state is 10: transcoded

There are some states in between, including 2: transcribing

Getting the WebVTT

Once the transcription is complete, (when Authot returns the status "10"), the Node server requests the VTT file, and then uploads the file to the existing video at api.video.


The webpage finally updates the text, saying "success". Clicking the video link - opens the video, and the user can enable captions and watch them play back.

Automatic captions!

Of course, automatic captions are not 100% accurate. You may have a few words incorrectly transcribed. However, in my testing the Authot API is pretty good, and the captions are of a very high quality.

On the other hand - manually creating captions will either take a lot of time, or cost a lot of money (to pay someone else to do the work). The ease of use, and low cost makes auto-captioning a compelling choice


We've described an application that allows you to upload a video and automatically add captions to the video. Try it out today at caption.a.video. If you're curious on how we built this app, the code is available on Github. Let us know what you think by posting on our community forum.

Add captions, and watch your video watch time increase!

Try out more than 80 features for free

Access all the features for as long as you need.
No commitment or credit card required

Video API, simplified

Fully customizable API to manage everything video. From encoding to delivery, in minutes.

Built for Speed

The fastest video encoding platform. Serve your users globally with 140+ points of presence. 

Let end-users upload videos

Finally, an API that allows your end-users to upload videos and start live streams in a few clicks.


Volume discounts and usage-based pricing to ensure you don’t exceed your budget.