api.video lets you quickly and securely deliver on-demand and live-stream videos directly from your website, software, or app. 

api.video

Glossary-DCT-compression.svg

## What is a discrete cosine transform (DCT)?

A discrete cosine transform is a type of Fourier transform that's used to compress video. It transforms a digital signal into the sum of its trigonometric functions. A digital signal is a wave, so you can figure out the signal's amplitude over time and turn that into a list of coefficients where each one represents a function that's used to get part of the original wave. 

The question is, how do you pull apart the wave so you can compress it into a set of coefficients representing different functions? And the answer is, by using the discrete cosine transform! Cosine functions are used rather than sine functions, because it's more efficient - you need less cosine functions to describe a signal. In general, a DCT transform takes a vector of length n that contains amplitudes, and returns a new vector of length n with all the coefficients for n cosine functions used to represent the signal. You can encode this in an n x n matrix, where each row represents a cosine function of a different frequency. 

## DCT and block compression
Because an n x n matrix is used in a DCT transform, this technique is often called block compression. You're taking a set of DCT blocks and compressing them. Blocks can be of different sizes, ranging from 4x4 to 32x32 pixels, with the most common being 8x8. 

## How DCT works
This is a very high level explanation of how DCT works. There are a few steps. The frame in your video is divided into blocks of pixels. Then each row of pixels is represented as a cosine wave. This cosine wave can be represented as a row in a matrix. You create a matrix of the size of your block, with a cosine function for each row or wave. Next, you figure out the coefficients for each of the waves in your matrix. This is accomplished through matrix multiplication. The end result is your completed matrix, which you can now compress. You take the K most significant cosine waves and save them. Say you have a matrix that is  So you will lose some data, but if you've chosen correctly, you'll be able to decompress your matrix and have the most important data intact. When you decompress, you add enough zeroes to your compressed matrix to recreate the original size of the matrix you compressed down from. You can then use an inverse DCT equation to get back your compressed data. 

## DCT today
The DCT compression technique has been around since the 70s. Today it's still the most popular algorithm for compressing video and appears in every codec, with various optimizations depending on the codec. 

Glossary-Deep-Learning.svg

## What is Deep Learning?
Deep learning is a subset of machine learning that employs artificial neural networks with multiple layers, designed to mimic the human brain's structure and function. In the context of video and AI, deep learning algorithms can process and analyze vast amounts of visual data, enabling sophisticated tasks in video production, editing, and analysis. These neural networks learn hierarchical representations of data, with each layer building upon the previous one to recognize increasingly complex patterns and features in video content.

## Fundamentals of Deep Learning in Video Processing
At its core, deep learning for video relies on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs excel at spatial analysis, detecting features like edges, textures, and objects within individual video frames. RNNs, on the other hand, are crucial for understanding temporal relationships, allowing the system to track motion and changes over time across multiple frames.

These neural networks are trained on large datasets of video content, learning to recognize patterns and make predictions. The training process involves feeding the network with labeled data, allowing it to adjust its internal parameters through backpropagation. As the network processes more data, it refines its ability to recognize and generate complex video features.

## Applications in Video Technology
Deep learning has revolutionized numerous aspects of video technology. Some key applications include:

1. Video Compression: Reducing file sizes while maintaining visual quality.
2. Video Enhancement: Upscaling low-resolution footage and reducing noise.
3. Object Recognition and Tracking: Enabling automated content moderation and intelligent surveillance.
4. Special Effects: Facilitating seamless integration of CGI elements and real-time face swapping.
5. AI-Powered Editing: Automating cutting, arranging, and even generating video content.
6. Content Analysis: Extracting metadata and understanding video context for better searchability and recommendation systems.  

&nbsp;  

These capabilities are pushing the boundaries of what's possible in video production, allowing for more creative and efficient workflows.

## Challenges and Considerations
Despite its power, deep learning in video faces several challenges. The computational requirements for processing high-resolution video in real-time can be substantial, often necessitating specialized hardware. There's also the issue of interpretability – the complex nature of deep neural networks can make it difficult to understand exactly how they arrive at their outputs, which can be problematic in sensitive applications.

Data quality and quantity remain crucial factors. Deep learning models require vast amounts of diverse, high-quality video data to train effectively and avoid biases. Ensuring this data is representative and ethically sourced is an ongoing challenge in the field.

## The Future of Deep Learning in Video
As deep learning techniques continue to advance, we can expect even more transformative applications in video technology. Researchers are exploring ways to make these models more efficient and capable of learning from smaller datasets. The integration of deep learning with other AI technologies, such as [Large Language Models](https://api.video/what-is/large-language-model/), promises to enable more holistic understanding and generation of video content.

Future developments may lead to fully automated video production systems capable of generating entire films or TV shows based on high-level creative direction. We might also see advancements in personalized video experiences, where content adapts in real-time based on viewer emotions or preferences, all powered by sophisticated deep learning [algorithms](https://api.video/what-is/algorithm/).

Glossary-Dithering.svg

## What is dithering?

Dithering is a technique you can use to make a lower bit-depth blend colors by introducing random noise to the image in strategic locations. When is dithering useful? - When you need to represent an image with lower [bit depth](https://api.video/what-is/bit-depth/) than you captured it at, or when you need to use a low bit depth in general. Sometimes a lower bit depth doesn't have enough colors to express the image well. In such cases, you'll see sharp lines or bands of color in places where the color shifts from one shade to another. This makes the image look less realistic, and lower quality. 

Here's an example:

thumbnail_bit-depth.png

bit-depth.png

Dithering is used in many different codecs to improve efficiency and storage of image and video. This technique is not limited to video - it's also used on the audio portion of video codecs and it can be applied to images as well.

thumbnail_dithering-red-blue-samples.png

dithering-red-blue-samples.png

See how the high bit depth on the left image (24-bit) allows for a more seamless transition between gradients of color. However in the right image (8-bit), you see sharp delineation between the bands of color. By dithering, or applying random noise to these areas, you can soften the delineation. For images, you can often create an approximation of a color by choosing the right colors from a palette and placing them next to each other. The human eye will see the colors mixed together as the desired color even though the palette doesn't contain this color. For example, say you have a color palette that doesn't have purple as a choice. You could approximate purple by placing very tiny red and blue pixels next to each other in the pattern of a checkerboard. If fine enough, your eye won't notice it's created with red and blue, and simply see purple as desired. 

Sometimes a limited color palette is ideal because the image will take less space. Remember that for bit depth, for each pixel enough space has to be available for whatever choice you might make. So in 8-bit color, a space that can handle up to 256 values is stored per pixel. In 10-bit color, a space that can handle up to 1024 values is stored per pixel. Sometimes it's not efficient to store the higher quality option, and this is where dithering is useful. 

Here's an image showing how dithering works. You can see that at first, with big patches of red and blue, you see the colors separately. But as the patches become smaller and smaller, you see more and more purple! 

Glossary-DVR.svg

## What is a DVR?

A DVR, or "Digital Video Recorder", is a device or software that enables users to record live television programs and store them digitally for later viewing. This technology has revolutionized the way people consume media, as it provides them with the ability to pause, rewind, and skip through live TV, effectively giving them greater control over their viewing experience.

One of the key features of a DVR is its recording capability. Users can record their favorite shows, movies, or any other video content to the device's internal hard drive, which can typically store hundreds of hours of recorded content depending on the model and storage capacity. This "time-shifting" functionality allows individuals to watch programs at their convenience, rather than being limited to the original broadcast schedule.

In addition to recording, DVRs often have the ability to pause and rewind live television. This can be particularly useful for taking a break during a program or rewinding to re-watch a specific segment.

Moreover, many DVRs are equipped with the capability to automatically skip over commercial breaks, providing users with a more seamless and uninterrupted viewing experience.

The scheduling feature of a DVR is another valuable aspect, as it enables users to plan and set recordings in advance, ensuring they never miss a favorite show or movie. This level of control and flexibility has contributed to the growing popularity of DVRs among television viewers.

cctv-nvr-surveillance-camera-systems.jpeg

At api.video we allow instant rewind of live streams, and we call this our DVR feature.  When a stream is live, you can rewind the live stream up to 5 minutes backwards to watch 'nearly live' content. Perhaps you missed what was said, and want to make sure you understand the speaker.  Or perhaps there was a sports play that you'd like to watch again.  Our DVR feature allows your users to rewind the live stream and see the action again!

## What is embedding?
Embedding is the process of adding external content such as links, videos, images, code samples, GIFs, and interactive objects like buttons to a website.  The most common way to embed content on the web is through an iframe.

The iframe allows a developer to insert code from a second source on a webpage in a safe way - the code runs in a different context and cannot hijack or pollute the code on the page.

## Embedding a video
When a video is uploaded to api.video, a videoId is assigned to the video, and a player URL is created that looks like:

```
https://embed.api.video/vod/vi4blUQJFrYWbaG44NChkH27
```

This url can be pasted into the browser and played, but since it has a JavaScript player and other dependencies, it cannot be simply added to a webpage. For that reason, every API response form api.video also includes the embeddable iframe url:  

```
<iframe src="https://embed.api.video/vod/vi4blUQJFrYWbaG44NChkH27" width="100%" height="100%" frameborder="0" scrolling="no" allowfullscreen="true"></iframe>
```

The iframe tells the browser that it can go to 100% the width and height of the container, will fit the size (no scrolling) and can go full screen (so you can see the whole video).

Glossary-Controls.svg

## What is the controls attribute in HTML5?
The controls attribute of the HTML5 video tag is one of the most reguarly used attributes for the video tag. The presence of this attribute adds buttons and playback bar to the bottom of the video, giving the viewer the ability to control video playback.  The absence of the ```controls``` attribute, or setting ```controls=false``` removes the user's ability to control playback of the video.

The absence of controls is typically used for background videos, or autoplaying, looping and silent movies that emulate an animated GIF.

For videos where the user is expected to have the ability to start/stop/seek a video or control the volume, this attribute is required.

Video playback with the controls visible.

video-playback-with-controls-visible.png

## Toggling controls in api.video

* Player: you can toggle the controls for all videos used by a single player with the [enable controls](https://docs.api.video/reference#patch_players-playerid) attribute for the player.
** This same endpoint allows you to conrtol the colours of the track playabck bar, and the icons for additional customization.
* Video: appending [#hide-controls](https://docs.api.video/delivery/video-playback-features) to the end of the video player URL will hide the controls for playback.



## What is a Content Delivery Network (CDN)?
A Content Delivery Network (CDN) is a geographically distributed network of servers that work together to provide fast and reliable delivery of digital content, such as web pages, videos, images, and applications, to users around the world. CDNs play a crucial role in enhancing the performance and accessibility of online content.

## How CDNs Work
At the core of a CDN is the concept of edge servers. These servers are strategically placed in multiple locations, often in close proximity to end-users, to minimize the physical distance between the user and the content they request. When a user accesses content, the CDN's intelligent routing system directs the user's request to the nearest edge server, which then delivers the content.

## This approach offers several benefits:
1. **Reduced Latency**: By serving content from edge servers closer to the user, CDNs minimize the time it takes for the content to reach the user's device, resulting in faster load times.
2. **Increased Scalability**: CDNs can handle sudden spikes in traffic by distributing the load across multiple servers, ensuring the content remains accessible even during periods of high demand.
3. **Improved Reliability**: If an edge server experiences an outage or high traffic, the CDN can seamlessly reroute requests to other available servers, maintaining content availability.
4. **Bandwidth Optimization**: CDNs reduce the burden on the content provider's origin servers by caching and serving content directly from the edge, freeing up resources and bandwidth.

## Benefits of Using a CDN
The advantages of leveraging a CDN extend beyond just performance. Content providers can also benefit from improved security, cost savings, and enhanced user experience.
1. **Security**: CDNs can help protect against distributed denial-of-service (DDoS) attacks and other security threats by absorbing and filtering malicious traffic before it reaches the origin servers.
2. **Cost Optimization**: By offloading content delivery to a CDN, content providers can reduce the bandwidth and infrastructure costs associated with serving high-volume or bandwidth-intensive content.
3. **Global Reach**: CDNs with a widespread network of edge servers can deliver content to users worldwide, ensuring a consistent experience for a global audience.
4. **Analytics and Insights**: Many CDN providers offer detailed analytics and reporting, allowing content owners to gain valuable insights into user behavior, traffic patterns, and content performance.

## Choosing the Right CDN

When selecting a CDN provider, content owners should consider factors such as global coverage, performance metrics, security features, pricing models, and the level of customization and integration available. Partnering with the right CDN can be a game-changer in delivering a seamless, high-performing, and secure online experience for users.  

&nbsp;  

At api.video, we understand the critical role that CDNs play in the delivery of video content. That's why we've integrated a robust and reliable CDN into our platform, ensuring your videos are served quickly and efficiently to viewers around the world. With api.video's CDN, you can focus on creating compelling content, while we handle the complexities of content distribution and optimization. [Sign up now](https://dashboard.api.video/register)!

medium_screenshot-2021-06-14-at-12-43-27.png

small_screenshot-2021-06-14-at-12-43-27.png

thumbnail_screenshot-2021-06-14-at-12-43-27.png

Glossary - Content Delivery Network (CDN)

Glossary-Content-Delivery-Network-CDN.svg

Glossary-Video-Container.svg

## What are Video Containers?
In the world of digital video, a video container, also known as a file format or wrapper, is a file structure that encapsulates various media elements, such as video, audio, subtitles, and metadata, into a single, self-contained package. These containers play a crucial role in ensuring the compatibility and seamless playback of video content across different devices and media players.

## Understanding Video Containers
Video containers act as a standardized framework that organizes and stores the various components of a video file. They provide a consistent way to package the encoded video and audio streams, along with any additional data like subtitles, chapter markers, or other metadata.

The choice of video container is essential because it determines the compatibility and interoperability of the video file. Different video containers support different codecs, features, and compression methods, which can impact factors such as file size, quality, and playback capabilities.

## Common Video Container Formats
Some of the most widely used video container formats include:
1. **MP4 (MPEG-4 Part 14)**: A versatile and widely supported container that can store video encoded with various codecs, such as H.264, AVC, and HEVC, as well as audio in formats like AAC and MP3.
2. **AVI (Audio Video Interleave)**: A legacy container format developed by Microsoft, primarily used for storing video and audio data in a Windows-compatible manner.
3. **MKV (Matroska)**: An open-source, highly flexible container that can handle a wide range of video and audio codecs, as well as advanced features like chapter markers and multiple audio/subtitle tracks.
4. **WebM**: An open, royalty-free container format primarily associated with the VP8 and VP9 video codecs, designed for web-based video delivery.
5. **QuickTime (MOV)**: A container format developed by Apple, commonly used in professional video production and post-processing workflows.  

&nbsp;  

The choice of video container often depends on the specific requirements of the video project, the target audience, and the playback platforms being supported.

## Importance of Video Container Selection
Selecting the appropriate video container is crucial for ensuring seamless playback and compatibility across different devices and media players. Factors such as the intended use case, target audience, and platform support should be carefully considered when choosing a video container format.

Incompatible or unsupported video containers can lead to playback issues, compatibility problems, and the need for additional transcoding or conversion steps. Understanding the capabilities and limitations of various video container formats is essential for content creators, video professionals, and developers working with digital video content.  

&nbsp;  

At api.video, we understand the importance of video container compatibility and optimization. Our platform supports a wide range of popular video container formats, including MP4, MKV, and WebM, ensuring your video content can be easily uploaded, processed, and delivered to your audience seamlessly. Whether you're working with a specific container format or need guidance on the best options for your project, api.video provides the flexibility and tools to handle your video needs effectively.

## What is Computer Vision?
Computer Vision is a field of [Artificial Intelligence](https://api.video/what-is/artificial-intelligence/) that enables computers to gain high-level understanding from digital images or videos. In the context of video technology, it allows machines to extract, analyze, and understand useful information from visual data, mimicking human visual processing capabilities.

## Key Components of Computer Vision
Computer Vision systems typically involve several key components:
1. Image Acquisition: Capturing or receiving visual data
2. Image Pre-processing: Enhancing images for further analysis
3. Feature Detection and Extraction: Identifying key points or patterns in images
4. Segmentation: Dividing images into meaningful regions
5. High-level Processing: Making decisions based on the extracted features
6. Output: Generating results such as object classifications or scene descriptions

## Computer Vision Techniques in Video Processing
Several techniques are particularly relevant to video processing, such as object detection and tracking, in order to identify and follow specific objects across video frames. Action recognition and scene understanding for an analysis of the overall context and environment in video frames, as well as understanding and classifying human actions or events in video sequences. Other techniques include optical flow (estimating motion between video frames) and 3D reconstruction (creating three-dimensional models from two-dimensional video sequences).

## Applications in Video Technology
Computer Vision has transformed numerous aspects of video technology:
- Video Surveillance: Automatically detecting and tracking objects or people in security footage.
- Content-Based Video Retrieval: Enabling search and organization of video libraries based on visual content.
- Augmented Reality: Overlaying digital information onto real-world video feeds.
- Video Editing and Post-production: Automating tasks like color correction, object removal, or special effects application.
- Quality Control: Detecting defects or inconsistencies in video production.
- Gesture Control: Enabling hands-free control of devices through video-based gesture recognition.

## Challenges and Considerations
While powerful, Computer Vision in video technology faces several challenges. Variability in visual data poses a significant hurdle, requiring systems to deal with changes in lighting, perspective, and occlusion in video sequences. Real-time processing is another critical concern, as achieving low-latency performance is essential for live video applications.

Many Computer Vision models also require large amounts of labeled video data for training, which can be resource-intensive. The ability to analyze video content raises important questions about surveillance and personal privacy, necessitating careful consideration of ethical implications. Finally, ensuring robustness and generalization remains a key challenge, as systems must perform well across diverse video scenarios and conditions.

## The Future of Computer Vision in Video
As Computer Vision techniques continue to advance, we can expect several exciting developments. Closer integration with Natural Language Processing and other AI domains will likely lead to more comprehensive video understanding. We can anticipate improved ability to interpret three-dimensional scenes from two-dimensional video input, enhancing 3D understanding.

More sophisticated analysis of human behavior and emotions in video content is on the horizon, paving the way for advanced emotion and intent recognition. As AI-generated content becomes more prevalent, we'll likely see the development of advanced techniques to identify synthetic or manipulated video content. Lastly, the rise of edge computing promises more powerful Computer Vision capabilities on mobile and edge devices, enabling sophisticated real-time video processing in a wider range of environments and applications.

As Computer Vision continues to evolve, it will undoubtedly play a crucial role in shaping the future of video technology, offering new possibilities for content creation, analysis, and user experience while also presenting new challenges and ethical considerations for the industry to address.

Video on demand

Live streaming

Analytics

Video infrastructure

Video player

AI features

See all features

Online learning & Corporate training

Social & Entertainment

Marketplace & E-commerce

Communication & UGC

Generative AI

Quickstart guides

Documentation

Ecosystem catalog

Clients & SDKs

Blog

Demos

Customer success stories

Help center

Dct compression

What is a discrete cosine transform (DCT)?

DCT and block compression

How DCT works

DCT today