Tutorials · 8 min read

deut screenshot

Video duets in the browser

Video sharing apps like TikTok offer 'duets', where you record a video next to a video playing back, add a commentary or sing along. In this demo, we'll build a duet app in the browser!

Doug Sillars

September 17, 2021

Video sharing apps like TikTok offer 'duets', where you record a video next to a video playing back, add a commentary, or sing along. In this demo, we'll build a duet app in the browser!

Creating a duet

For a duet, you need to have a camera, a mic and a video that you want to duet with. Since the video you are singing along with plays out loud, using headphones is recommended for best results. Due to varying support of media APIs, this demo will work in Chrome, Edge and Firefox, but will not run in Safari. :(

The default sing-along video is Nathan Evans singing "The Wellerman," a sea shanty that has been a viral duet hit on TikTok this year, but you can change this with the video URL input box on the left side of the screen.

When you access the site, it will ask you for access to the camera and mic for camera and mic access - this is clearly required for your duet. Once you've approved usage, choose which mic/camera combo you'd like to use, and start recording. Both you and Nathan will appear next to each other. Sing to your heart's content, and when you are done with the duet, press "stop." The video playback will stop and your video will be uploaded to api.video. In a few seconds the video will be ready for playback at the link provided.

(If you notice that your video/audio is a bit delayed from the singer in the video, you can move the video delay slider and it will shift your video/audio by 50ms to help you sync them up. We'll discuss why this is needed in the implementation section.)

I'm not the best singer. :)

How it all works

Much of this code was restructured from our record.a.video app - rather than sharing a screen - we share a video.

Routing the video

The video from the camera and the video are both drawn to a canvas. The canvas is then sent to a MediaRecorder. When the recording is completed, we use the videoUploader to upload the video to api.video.

video flow for app

Interesting features:

When grabbing the video from the camera, you can specify the aspect ratio:

 var videoOptions = {
            deviceId: cameraId,
            aspectRatio: {ideal: 9/16},
            
            frameRate: {ideal: cameraFR}
        };

To keep this 'TikToky', we'll force the video capture to be portrait (9/16). This is supported in Chromium browsers, but not in Firefox (so we lay out the video differently for FF users):

screenshot of my duet

The video from the canvas is captured and sent to the MediaRecorder. When recording is complete, the video is then uploaded to api.video.

Routing the audio

In record.a.video, there is just one audio input source - the mic. In duet.a.video, we have 2 audio sources, the mic and the video. We cannot simply add a second audio track to the MediaRecorder - it only supports one audio track. So, we look to the Web Audio APIs to help us 'mix' the audio.

In the code below, we create audiocontexts for the streams from the mic and from the video:

  //audio track from mic "audioStream"
     micAudioIn = audioContext.createMediaStreamSource(audioStream);

  //audio track from video
    if(videoElem.captureStream){
                videostream = videoElem.captureStream();
   }else{
                videostream = videoElem.mozCaptureStream();
  }
  videoAudioIn = audioContext.createMediaStreamSource(videostream);
  videoAudioin2 = audioContext.createMediaStreamSource(videostream);

micaudioIn is the audio context form the microphone, and the videoAudioIn is the audio extracted from the video. The captureStream property is considered experimental, so Firefox requires the mozCaptureStream property. (This is where Safari fails us, there is no support for captureStream in Safari).

mdn support for capture Stream

You may have noticed that we create two contexts for the audio from the video. In Chrome, when you extract the audio from a video to an audiocontext, the audio still goes to the speaker. In Firefox, the audio is extracted to the audiocontext, and does NOT get routed to the speaker.

audio context schematic

The solution is to create 2 audiocontexts for the video, and send the second one to the speakers (audioContext.destination is the default destination of the audio):

		videoAudioin2 = audioContext.createMediaStreamSource(videostream);
		//i want to send videoAudio2 to te speakers
		videoAudioin2.connect(audioContext.destination);


Going back to the mic and the audio from the video, we must combine them into one stream and connect them to the MediaRecorder. This is relatively easy, but first we want to do some audio 'mixing.'

We will perform 2 operations on the video's audio stream: a volume & a delay control.

Volume mixing

When adding 2 videos, perhaps one video is louder than the other one. The createGain() property of the audio context lets you raise or lower the volume of an audio track. Since there are only 2 tracks being mixed in, we just need to be able to adjust one track - so we modify the volume of the video. In the case of the "Wellerman" video, I found 50% to be sufficient.

I also found that the microphone audio was delayed compared to the audio from the video (try clapping along with a video when the delay is set to zero - the results are disconcerting). We can add a delay() to the video's audio to re-sync the 2 streams. You can set this value with the slider in the app (each step is 50ms of delay). For my computer, I found that 250ms helped to sync the two videos.

We then combine the two streams and add them to the mediaRecorder we created with the video.

     //change the volume of the video in
            var gainNode = audioContext.createGain();
            var volume = document.getElementById("volume").value/100;
            console.log("vol", volume);
            videoElem.volume = volume;
            gainNode.gain.value = volume;
            videoAudioIn.connect(gainNode);

            //delay the video in a bit
            var delay = audioContext.createDelay();
            
            delay.delayTime.value = micDelay;
            delay.connect(gainNode);

            //heres the destination for the combined audios
            audiocontextDest = audioContext.createMediaStreamDestination();

            //add the audio to the destination
            micAudioIn.connect(audiocontextDest);
            gainNode.connect(audiocontextDest);
            
            //ok so now the audio is in the audiocontextDest stream
            //grab the stream
             var audiocontextDestStream = audiocontextDest.stream;


             //grab the audio track from the stream
             var audiocontextDestStreamAudioTracks =  audiocontextDestStream.getAudioTracks();
            //add the audio track to the canvas output stream (the video)
            stream.addTrack(audiocontextDestStreamAudioTracks[0]);
            console.log("audio stream added!");

Now that both video and audio are added to the MediaRecorder - we can use the api.video video uploader JS library to upload our duet to api.video.

audio flow schematic

Conclusion

That's all there is to it! Along the way, we've gotten to play with some fun web audio and video APIs to manipulate and mix the audio and video in the browser into the recording we create.

Try out duet.a.video for yourself. The code is on GitHub if you'd like to reuse and create your own duet app.

If you have any questions or suggestions, please share them on our community forum. If you don't have an api.video account yet, you can register in just a few moments by following this link. Happy building!

Try out more than 80 features for free

Access all the features for as long as you need.
No commitment or credit card required

Video API, simplified

Fully customizable API to manage everything video. From encoding to delivery, in minutes.

Built for Speed

The fastest video encoding platform. Serve your users globally with 140+ points of presence. 

Let end-users upload videos

Finally, an API that allows your end-users to upload videos and start live streams in a few clicks.

Affordable

Volume discounts and usage-based pricing to ensure you don’t exceed your budget.