What is a discrete cosine transform (DCT)?
A discrete cosine transform is a type of Fourier transform that's used to compress video. It transforms a digital signal into the sum of its trigonometric functions. A digital signal is a wave, so you can figure out the signal's amplitude over time and turn that into a list of coefficients where each one represents a function that's used to get part of the original wave.
The question is, how do you pull apart the wave so you can compress it into a set of coefficients representing different functions? And the answer is, by using the discrete cosine transform! Cosine functions are used rather than sine functions, because it's more efficient - you need less cosine functions to describe a signal. In general, a DCT transform takes a vector of length n that contains amplitudes, and returns a new vector of length n with all the coefficients for n cosine functions used to represent the signal. You can encode this in an n x n matrix, where each row represents a cosine function of a different frequency.
DCT and block compression
Because an n x n matrix is used in a DCT transform, this technique is often called block compression. You're taking a set of DCT blocks and compressing them. Blocks can be of different sizes, ranging from 4x4 to 32x32 pixels, with the most common being 8x8.
How DCT works
This is a very high level explanation of how DCT works. There are a few steps. The frame in your video is divided into blocks of pixels. Then each row of pixels is represented as a cosine wave. This cosine wave can be represented as a row in a matrix. You create a matrix of the size of your block, with a cosine function for each row or wave. Next, you figure out the coefficients for each of the waves in your matrix. This is accomplished through matrix multiplication. The end result is your completed matrix, which you can now compress. You take the K most significant cosine waves and save them. Say you have a matrix that is So you will lose some data, but if you've chosen correctly, you'll be able to decompress your matrix and have the most important data intact. When you decompress, you add enough zeroes to your compressed matrix to recreate the original size of the matrix you compressed down from. You can then use an inverse DCT equation to get back your compressed data.
The DCT compression technique has been around since the 70s. Today it's still the most popular algorithm for compressing video and appears in every codec, with various optimizations depending on the codec.