Tutorials ยท 14 min read

Upload a big video file using Python

Upload a big video file using Python

If you try to upload big files in Python, a common way to handle them is by breaking them into chunks. This tutorial will show you how to break your file into chunks and send the file for upload to api.video.

Erikka Innes

February 11, 2021

If you have a large video you need to upload, sometimes the server you're uploading to will not accept it in one request. Different servers have different limits. For api.video, you can upload a video file up to a size of 199 MiB before your file is considered too big. Once your video crosses this limit, to upload it you'll need to break it into chunks and send it that way. You send each chunk and the server reassembles it for you.

In this tutorial, we'll walk through a simple example of how to do a large video upload.

This looks like it would turn into a really big video file.


Prerequisites

  • Account with api.video
  • api.video API key
  • You'll need access to Python and you'll need to install the requests library if you don't already have it added

And if you'd rather watch this tutorial than read it, check out our video tutorial here:

What You'll Be Doing

This is a quick summary of what you're going to do to upload your large video.

  1. You'll authenticate with api.video using your API key and retrieve a token that lasts one hour that you can use to access all of api.video's other endpoints.
  2. You'll create a video container for your video and retrieve the videoId.
  3. You'll use the videoId to create the endpoint you'll send your video to.
  4. You'll break up your video file into chunks, then upload each chunk of the video file.

Troubleshooting / What to Watch Out For

If you decide to build your own uploader, a few things will help you out:

  • Remember that you must open the file as binary and work with it as binary. It's easy to forget and open it without adding 'rb' or depending on what you're doing, 'wb'.
  • Headers are very important. The wrong header will prevent you from sending your content.
  • When using the Content-Range header, you must make sure you are listing the correct set of bytes being sent. If you are off by even one byte, it won't work. You will get weird errors like: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 52: invalid continuation byte And the solution is often a simple fix - correct how you're listing your bytes in the header.
  • When you send bytes in the Content-Range header, it's inclusive. If you want to send the first 100 bytes, you write it as 0-99. This can get confusing when you're breaking your file into chunks because you can say 'grab the first 100 bytes' and the computer knows to grab 0-99. When you say 'grab the next 100 bytes,' the computer would grab 100-199. However, it's easy to write a dynamic header where you make mistakes like listing 0-100 instead of 0-99, or starting over on the same byte, for example 0-100, then 100-200. All these kinds of mistakes will cause a decode or encoding error. If you choose to build your own uploader instead of using this sample, keep this in mind.
  • If you label what type of file you're sending, you may cause unexpected issues. This sample works by opening, reading, and sending your video file as chunks of binary data. When it arrives, it's encoded into HLS for you, no matter what kind of video it started out as.

Code Sample

## How to upload a large video that is over 199 MiB to api.video. (Though this script will also work for videos under 200 MiB if you want to test it out.)

import requests
import os 

## Set up variables for endpoints (we will create the third URL programmatically later)
auth_url = "https://ws.api.video/auth/api-key"
create_url = "https://ws.api.video/videos"

## Set up headers and payload for first authentication request
headers = {
    "Accept": "application/json",
    "Content-Type": "application/json"
}

payload = {
    "apiKey": "your API key here"
}

## Send the first authentication request to get a token. The token can be used for one hour with the rest of the API endpoints.
response = requests.request("POST", auth_url, json=payload, headers=headers)
response = response.json()
token = response.get("access_token")

## Set up headers for authentication - the rest of the endpoints use Bearer authentication.

auth_string = "Bearer " + token

headers2 = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": auth_string
}

## Create the video container payload, you can add more parameters if you like, check out the docs at https://docs.api.video
payload2 = {
    "title": "Demo Vid from my Computer",
    "description": "Video upload test."
}

## Send the request to create the container, and retrieve the videoId from the response.
response = requests.request("POST", create_url, json=payload2, headers=headers2)
response = response.json()
videoId = response["videoId"]

## Create endpoint to upload video to - you have to add the videoId into the URL
upload_url = create_url + "/" + videoId + "/source"

## Set up the chunk size. This is how much you want to read from the file every time you grab a new chunk of your file to read.
## If you're doing a big upload, the recommendation is 50 - 80 MB (50000000-80000000 bytes). It's listed at 6MB (6000000 bytes) because 
## then you can try this sample code with a small file just to see how it will work.  The minimum size for a chunk is 5 MB.

CHUNK_SIZE = 6000000

## This is our chunk reader. This is what gets the next chunk of data ready to send.
def read_in_chunks(file_object, CHUNK_SIZE):
    while True:
        data = file_object.read(CHUNK_SIZE)
        if not data:
            break
        yield data

## Upload your file by breaking it into chunks and sending each piece 
def upload(file, url):
    content_name = str(file)
    content_path = os.path.abspath(file)
    content_size = os.stat(content_path).st_size

    print(content_name, content_path, content_size)

    f = open(content_path, "rb")

    index = 0
    offset = 0
    headers = {}

    for chunk in read_in_chunks(f, CHUNK_SIZE):
        offset = index + len(chunk)
        headers['Content-Range'] = 'bytes %s-%s/%s' % (index, offset -1, content_size)
        headers['Authorization'] = auth_string
        index = offset
        try:

            file = {"file": chunk}
            r = requests.post(url, files=file, headers=headers)
            print(r.json())
            print("r: %s, Content-Range: %s" % (r, headers['Content-Range']))
        except Exception as e:
            print(e)

## Add a path to the file you want to upload, and away we go! 
upload('your-giant-file-here.mp4', upload_url)

Code Walkthrough

Wanna know how it works in a little more detail? Great, here we go! The first couple of sections might look like some of the other blog posts, scroll down for a discussion about the new stuff if that's the case.

You'll start by adding the requests and os libraries, so you can easily make HTTP requests.

import requests
import os

Next, let's assign a variable to each endpoint we'll use, they'll be a little easier to work with. We're going to assign two of them here, and we'll use the second one later to build the third one:

  • Authentication endpoint - This is where we send our API key to exchange it for a token. It's auth_url in the code sample.
  • Videos endpoint - Specifically we'll use the feature on this endpoint that's for creating a video. It's create_url in the code sample.
  • Video Upload endpoint - It looks a lot like the other endpoint. It's create_url + "/videoId" + "/source". We'll construct this after we get a videoId to work with.
auth_url = "https://ws.api.video/auth/api-key"
create_url = "https://ws.api.video/videos"

Authentication - Get Your Token

We'll prepare our headers and payload for authentication, then send a POST request to api.video. With the api.video API you start authentication by sending a request using your API key to retrieve a token. You can use your sandbox or your production key, the backend will figure out which you are using and handle it accordingly. If you choose to use your sandbox credentials for this walkthrough, your video will be uploaded with the watermark 'FOR DEVELOPMENT PURPOSES ONLY.':

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json"
}

payload = {
    "apiKey": "your API key here"
}
response = requests.request("POST", auth_url, json=payload, headers=headers)

Now we need to get the token from our response, so we can authenticate with the rest of api.video's endpoints when we want to do something. To get the token, we'll convert the response to JSON, then retrieve the token out of the JSON dictionary.

response = response.json()
token = response["access_token"]

Set up a Video Container

Using the token, we'll get the headers and payload ready for our next request, which will be to ask api.video to create a video container for us.

## Set up headers for authentication
headers_bearer = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": auth_string
}

## Create a video container
payload2 = {
    "title": "Demo Vid from my Computer",
    "description": "Video upload of Big Buck Bunny to demo how to do an upload from a folder on your computer."
}

api.video endpoints use bearer authentication. Add the authorization header for bearer authentication to your list of headers, and then for the payload we keep it simple, just a title and description. But what about the path to your file? Don't worry we're just creating a container right now. When you upload from your computer, you create a container first, and do your upload second.

Send Your Request and Grab the VideoId

For the next part of our walkthrough, we'll send our request, convert the response to JSON, and retrieve the videoId. We need the videoId so we can tell api.video what container we want to upload our video to.

response = requests.request("POST", create_url, json=payload2, headers=headers_bearer)
response = response.json()
videoId = response["videoId"]

Create the Endpoint for Uploading Your Video

Set up the endpoint you'll upload your video to, which is composed of https://ws.api.video/videos + / + videoId + /source.

Build a Data Chunk Reader

To break your data into chunks, you're going to want to decide how big each chunk should be, and then from an open file keep grabbing the next chunk of data to read. We'll set the chunk size at 1000000 bytes. This is fairly small, in case you wanted to try the sample with a smaller video file to see how it works. If you plan to start using this data chunk reader, make the chunk size larger. A good size is about 50-80 MB or about 50000000-80000000 bytes.

This function assumes you've already opened your file elsewhere, and it's using yield. What this does, is return a generator. Every time you use this function as part of a for loop, it will run until it hits yield, then return the value. Yield 'remembers' where you left off, so each time you loop through, it will return a new chunk of data.

CHUNK_SIZE = 1000000

def read_in_chunks(file_object, CHUNK_SIZE):
    while True:
        data = file_object.read(CHUNK_SIZE)
        if not data:
            break
        yield data

Start Your Large Video Upload


Here's our next function. This one is the upload function, and it contains our chunk reader. What we do here, is use the os library to grab some file details, so we know how big the file we're going to send is, the path to get to it, and the name of the file.

We open our file, and NOTE...we are opening our file in binary. This is important to remember, since opening a video file in other formats will probably not work.

Next we set up our index, that's going to represent what point we're at in the file. We also set up the offset, which represents how far forward to move the index after we send each chunk.

Then we start reading chunks of data. For each chunk of data, we prepare the Content-Range header so we are correctly listing what part of the file we are sending in bytes. To send our data chunk, we need to put it into a one item file dictionary. We do that, and then for readability assign it to a variable. Then we send our request! This will loop through until we're out of data chunks (the last chunk will probably be smaller than the others).

def upload(file, url):
    content_name = str(file)
    content_path = os.path.abspath(file)
    content_size = os.stat(content_path).st_size

    print(content_name, content_path, content_size)

    f = open(content_path, "rb")

    index = 0
    offset = 0
    headers = {}

    for chunk in read_in_chunks(f, CHUNK_SIZE):
        offset = index + len(chunk)
        headers['Content-Range'] = 'bytes %s-%s/%s' % (index, offset -1, content_size)
        headers['Authorization'] = auth_string
        index = offset
        try:

            file = {"file": chunk}
            r = requests.post(url, files=file, headers=headers)
            print(r.json())
            print("r: %s, Content-Range: %s" % (r, headers['Content-Range']))
        except Exception as e:
            print(e)

You don't need any other headers.

You will know you are successful after you send the last chunk. It's possible to receive a -200- response for each chunk, then have it not work out at the very end, so make sure to watch your video upload the first time through. Success is denoted by the response to the last chunk, where you will receive something that looks like this:

{
 'videoId': 'viyourvideoIdz', 
 'title': 'Demo Vid from my Computer', 
 'description': 'Video upload test.', 
 'public': True, 
 'panoramic': False, 
 'mp4Support': True, 
 'publishedAt': '2021-02-08T06:14:53+00:00', 
 'createdAt': '2021-02-08T06:14:53+00:00', 
 'updatedAt': '2021-02-08T06:14:53+00:00', 
 'tags': [], 
 'metadata': [], 
 'source': {
	 'type': 'upload', 
	 'uri': '/videos/vviyourvideoIdz/source'
	 }, 
 'assets': {
	 'iframe': '<iframe src="https://embed.api.video/vod/viyourvideoIdz" width="100%" height="100%" frameborder="0" scrolling="no" allowfullscreen="true"></iframe>', 
	 'player': 'https://embed.api.video/vod/viyourvideoIdz', 
	 'hls': 'https://cdn.api.video/vod/viyourvideoIdz/hls/manifest.m3u8',    
	 'thumbnail': 'https://cdn.api.video/vod/viyourvideoIdz/thumbnail.jpg'
	 }
}
r: <Response [201]>, Content-Range: bytes 4000000-4824145/4824146

Check your video's status in your dashboard. If you have a very large video file, wait a few minutes while api.video finishes processing it. When processing is finished, you'll be able to see a thumbnail for your video and play it from the dashboard.

If your video upload is unsuccessful, you won't receive the detailed message listed above. You might receive an additional 200 response, or a 400 response for each chunk, depending on how much went wrong with your upload. In the dashboard, you may see your video uploaded, but it stays stuck on processing. If this happens, review your code, or contact us with your issue in the community forum.

Try out more than 80 features for free

Access all the features for as long as you need.
No commitment or credit card required

Video API, simplified

Fully customizable API to manage everything video. From encoding to delivery, in minutes.

Built for Speed

The fastest video encoding platform. Serve your users globally with 140+ points of presence.ย 

Let end-users upload videos

Finally, an API that allows your end-users to upload videos and start live streams in a few clicks.

Affordable

Volume discounts and usage-based pricing to ensure you donโ€™t exceed your budget.