What are Captions?
Captions are the transcription of spoken dialog into written text that is displayed during video playback. Captions can also describe actions that occur in the video. Subtitles are similar, but are often a translation of the captions into a different language, and do not contain the descriptions.
While initially introduced as an accessibility feature for those who are deaf or hard of hearing, captions are increasingly popular for video playback. Amongst the reasons:
- Autoplaying video on the web must be muted. By adding captions, users tend to linger on the autoplaying video. Facebook found that ads with captioning were watched 12% more than those without.
- Videos can be watched at a lower volume - allowing parents to watch videos while children sleep (or the other way around).
- Accents. Sometimes, the audio can be hard to understand, and the captions help resolve this issue.
Caption delivery: the VTT file
The Video Text Track (VTT) is a standardized file type to deliver captions.
For each caption, the VTT file identifies a start and end time for each string of text. For example, if you were Rick Rolled, you might see this caption:
5 00:00:44.030 --> 00:00:47.260 Never going to let you down.
Captions with api.video
Captions are added to a video using the Upload caption endpoint.
You can read more in our tutorial