api.video

Features

Documentation

The History of Video Compression Standards, From 1929 Until Now

June 9, 2021 - Erikka Innes

Video compression reduces the total number of bits you need to represent an image or video. Over the years people have come up with many different algorithms for compressing video. While video compression seems very modern, it has a long history that begins with analog video. In our review of the history of video compression, we'll stop at some major milestones that brought us to where we are today. You'll notice that in the beginning, new developments are concepts, then eventually these become various video compression standards. Many of these standards are used today, and people continue to develop new and improved ones all the time.

1929: The First Appearance of Interframe Compression

Surprisingly, the first discussion of interframe compression happened way back in 1929. Interframe compression is the idea of saving a key image, and then only saving changes to that image as they occur from frame to frame. The initial frame used to compare the others to is called the keyframe. R.D. Kell proposed this concept for use with analog video, but this concept persists and is used today with digital video!

1952: Differential Pulse-Code Modulation

The next milestone for video compression occurred in 1952. Bell Labs researchers B.M. Oliver and C.W. Harrison suggested that you could use differential pulse-code modulation (DPCM) in video coding. Previously, DPCM was used for audio (and still is today). DPCM is a technique where you take samples of an image and then predict future sample values based on this knowledge. Because you can accurately reconstruct an image with guesses, you don't need to store as much of that image's data.

1959: Predictive Interframe Video Coding with Temporal Compression

In 1959, predictive interframe video coding using temporal compression is first proposed. Temporal compression involves choosing a set of spaced out keyframes in a video, and then only encoding the changes between those frames. The keyframes are the only frames that are fully recorded as reference points that the other frames rely on. This concept was presented by NHK (The Japan Broadcasting Corporation) researchesrs Y. Taki, M. Hatori, and S. Tanaka.

1967: Run-length Encoding

Run-length encoding, or RLE for short, is a type of data compression where you take a data value that recurs over and over, and store it as a single value with a count. You can use this information to accurately rebuild the exact same image later! This concept was proposed in 1967 by University of London researchers A.H. Robinson and C. Cherry. It was first used to reduce transmission bandwidth of analog television signals. This concept is still used in digital video today.

1970s: Early Digital Video Algorithms

With the advent of the 70s came the introduction of digital video. It was sent using the same techniques as were used for telecommunications - pulse-code modulation (PCM). You might recognize this term from a little bit earlier when we talked about DPCM. PCM is used to digitally represent a sampled analog signal. It's a standard for audio and in the 70s it was used to convert to digital video. It required high bitrates and wasn't a very efficient way to transmit video, but it worked.

1972: The First Practical Compression of Digital Video

Around 1972, at Kansas State University, Nasir Ahmed proposes DCT coding for image compression. DCT stands for Discrete Cosine Transform. It's an image processing technique where you split images into parts composed of different frequencies. During a process called quantization, some of the frequencies are dropped. The more important frequencies are saved and used for image reconstruction later. Because some frequencies are dropped, the image won't be exactly the same, but it's often good enough that you will not notice a lot of difference.

1973: DCT Technique Becomes an Image Compression Algorithm

Nasir Ahmed worked with T. Natarajan and K.R. Rao at the University of Texas to shape the DCT concept into a working algorithm for image compression. The results of their work were published in 1974.

1974: The Development of Hybrid Coding

In 1974, Ali Habibi at the University of Southern California combined predictive coding with DCT coding. As we went over earlier, predictive coding is about guessing what will come before or after a current image. Habibi's algorithm worked within each image (intraframe) rather than predicting across images (interframe).

1975: Further Development of Hybrid Coding

John A. Roesse and Guner S. Robinson developed Habibi's algorithm further, making it usable across frames (interframe). They experimented with many different ways of doing this and found that Ahmed's DCT technique was the most efficient to combine with predictive coding.

1977: A Faster DCT Algorithm

The DCT algorithm was optimized for video encoding by Wen-Hsiung Chen, with C.H. Smith and S.C. Fralick. They founded Compression Labs to commercialize DCT technology for video.

1979-1981: Motion-Compensated DCT Video Compression

Anil K. Jain and Jaswant R. Jain continued developing motion-compensated DCT video compression. This kind of development continued, with Wen-Hsiung Chen using their work to create a video compression algorithm combining all the research. Work on motion-compensated DCT led to this becoming the standard compression technique used from the 1980s until now.

1984: H.120, the First Digital Video Compression Standard

All of the previous research finally let to the first video compression standard - H.120. This standard was great with individual images but didn't do as well when it came to preserving quality from frame to frame. Later revised in 1988, this standard was the first international standard for video compression. Its main use was for videoconferencing. While this was a great first effort, many inefficiencies led to various companies experimenting with ways to improve the standard.

1988: Video Conferencing with H.261

The first in the series of codecs you've probably seen or used is H.261. It's the first digital video compression algorithm that efficiently uses intraframe and interframe compression techniques. H.261 became the first commercially successful digital video coding standard. It was used around the world for video conferencing and is responsible for introducing hybrid block-based video coding, which is still used in many video standards today (MPEG-1 Part 2, H.262/MPEG-2 Part 2, H.263 MPEG-4 Part 2, H.264/MPEG-4 Part 10, and HEVC). The way people collaborated to build this standard is also still widely used. Its maximum resolution was 352x288.

While this standard was popular internationally, when it was first released it wasn't complete. The standard was revamped in 1990 and again in 1993. The standard doesn't include details for how to handle encoding - it's only for decoding video.

1992: Multimedia PC Applications with Motion JPEG

Motion JPEG was created in 1992 for use in multimedia on computers. This video compression technique compresses each video frame separately into a JPEG image.

1993: Video CDs with MPEG-1

MPEG stands for Moving Pictures Experts Group. It's an alliance of working groups of ISO and IEC that set standards for media coding. Around 1988, they began collaborating on the video coding standard known today as MPEG-1. Similar to H.261, there's no standards for how to encode video included, though a sample implementation is provided. Because of this, MPEG-1 coding can offer widely different performances depending on how it is encoded.

MPEG-1 is specifically designed to compress VHS-quality raw digital video, audio and metadata for use on video CDs, digital cable, satellite TV, and file sharing for reference, archiving and transcription. It has a maximum resolution of 352x288. You might best recognize MPEG-1 in audio - it's used to create MP3s.

1994: TV Broadcasts and DVDs with H.262 and MPEG-2

MPEG-2 and H.262 are interchangeable names for the same video standard. This was a video coding format developed by many companies working together. The standard supports interlaced video, which is a technique used in analog NTSC, PAL and SECAM television systems. This coding standard uses a number of interesting coding techniques. I'll go over two here.

Picture Sampling

MPEG-2 reduces data with picture sampling techniques. One is separating each frame in a video into two fields - one field which has odd numbered horizontal lines and one field that has all the even numbered ones. When displayed after decoding, the fields are shown individually one after another, with the lines from one field showing between the lines of the previous field. This is called interlaced video.

Another key strategy employed takes advantage of the fact that humans see brightness better than color. MPEG-2 uses chroma subsampling, which is a way of encoding video using less resolution for chroma (color) information than luma (brightness) information. Because humans don't see color as well, when done correctly you won't notice the information removed for compression.

These are just a couple of aspects of picture sampling in MPEG-2. For the sake of brevity, I'll save going in-depth about this and other video standards for future articles.

I-frames, P-frames, B-frames

MPEG-2 uses different kinds of frames to efficiently compress data. I-frames are intra-coded frames, which means within a single frame. Compression and removal of data from this kind of frame relies on the eye's inability to notice certain changes in an image. P-frames stands for predictive-coded frames. This kind of frame contains the difference between itself and information found in a previous I-frame, P-frame or B-frame. B-frames are similar to P-frames, but they use information from pictures ahead and before them for reference. There's a lot that happens with this standard, so I'll go more into it in a future post.

1995: Store Digital Videos with DV

The first DV specification was called Blue Book. It defined common features for things like physical videocassettes, recording modulation method, magnetization and basic system data. DV uses intraframe video compression to compress video frame by frame using DCT. Similar to MPEG-2, it uses chroma subsampling for further compression.

DV was created by Sony and Panasonic for professional and broadcast users. This storage method is mostly obsolete today since we have memory cards and solid state drives now instead.

1996: A New Videoconferencing Standard with H.263

H.263 was created using H.261 as a base. It relies on discrete cosine transform compression to create low bitrate compressed video that can be used for video conferencing. Some familiar uses of this standard were on the internet in Flash Video content as well as places like YouTube and MySpace. It continued to be used all over the internet until the creation of H.264.

1999: Internet Video with MPEG-4 Part 2

MPEG-4 Part 2 (also called MPEG-4 Visual) is a standard that's compatible with H.263. It was often used for surveillance cameras as well as high definition TV broadcasating and DVDs. It's faster than MPEG-2 and uses a more efficient compression algorithm. However, it can't handle the AVC (Advanced Video Coding) format, which leads to the creation of MPEG-4 AVC later on.

2003: Blu-rays, DVDs, Live Streaming and Broadcast TV with H.264/MPEG-4 AVC

H.264/MPEG-4 AVC (sometimes called MPEG-4 Part 10) was published in 2003. The goal of this type of video compression was to create high-definition, digital video that was flexible enough for use on a wide variety of different systems, networks, and devices. This is currently the most popular standard used. You'll see H.264 used for satellite, internet, telecommunications networks and cable. It's also available across many decoders, browsers and mobile devices. Some places you recognize it from are Blu-ray discs, Netflix, Hulu, Amazon Prime Video, Vimeo, YouTube and pretty much anything you see that's video on the internet. It has a maximum resolution of 4096x2048.

This standard is based on block-oriented, motion-compensated integer-DCT coding. Integer-DCT coding is a particularly fast algorithm that implements discrete cosine transformation. In addition to this H.264 does a lot to make some of the techniques I discussed earlier more efficient. This standard allows for compression where no data is lost (lossless compression) and lossy compression (where some data is dropped out and can't be regained). It's very flexible compared to some of the earlier standards in terms of what you can use it to accomplish and where it can be used. Another plus is this technology is available for free when used to stream content across the internet.

2013: 360° Video, Augmented Reality and Virtual Reality with H.265/HEVC

H.265/HEVC (High Efficency Video Coding) does everything that H.264 does, but better. It reduces file size by 50% and allows for very high quality resolutions for video - up to 8K. (The maximum resolution is 8192x4320.) While you often don't need 8K or can't get it with devices and networks today, it's incredibly useful for immersive experiences like Augmented Reality (AR), Virtual Reality (VR), and 360° video. The main reason it's not in use more widely is cost. Big companies like Netflix or Amazon Prime Video and others can afford to pay to use this standard, but many places still choose to stick with H.264 when possible.

2013: VP9

VP9 is a competitor to H.265. It's free, unlike H.265. It was developed by Google. H.265 outperforms when there are high bitrates. As with H.265, it takes awhile to encode the video, which can negatively impact latency. This issue with both VP9 and H.265 led to the continued use of H.264. VP9 is becoming more popular because it's free, but it remains to be seen what will happen in terms of whether more widespread adoption occurs.

2018: High Quality Web Video with AV1

Google, Amazon, Cisco, Intel, Microsoft, Mozilla and Netflix all decided to create a new video format standard - AV1. It was to be the successor to VP9, open-source and royalty free. This format was specifically designed for real-time applications like WebRTC and higher resolutions. The goal was for it to be able to handle 8K video. It uses the typical block-based transform I've talked about in previous video standards, but they're implemented with new techniques. It has more precise ways of dividing images into blocks, can predict what's happening between frames with more precision and uses improved filtering. I will provide more detail about this standard in a future post.

2020: Commercially Viable 4K Broadcasting with H.266/VVC

H.266/VVC (Versatile Video Coding) aims to make 4K video broadcasting viable. Released in July 2020, it's the most recently released video compression standard out there. While it has goals like providing a 30-50% better compression rate than AV1, it's not doing anything especially new. It's using a block-based hybrid video coding approach and the idea is to find ways to optimize and improve existing algorithms and compression techniques. Encoding still remains slow, but this standard offers good quality improvements at lower bitrates.

We'll just have to wait and see what they come up with next! And if you felt a bit lost with some of the shorter explanations of techniques used for encoding, we'll go over those in future posts!

If you're looking to encode your videos - api.video's encoding is lightning fast and supports all the standards I talked about today.

Resources

  1. ITU-T SG 16 standardization on visual coding - the Video Coding Experts Group (VCEG) - https://www.itu.int/en/ITU-T/studygroups/2017-2020/16/Pages/video/vceg.aspx

Erikka Innes

Developer Evangelist

Get started now

Connect your users with videos