Learning should always be fun

Audio streaming from your browser

Audio streaming from your browser

Hello gud ppl! Its been long that I have not written new post. Today  we will dig into one of my favorite topic – Audio streaming over internet. Well the topic seems to be very intimidating but the fact is that this specific topic has such a broader scope that only one post is not enough. However we will look into some basic stuffs.

Before we begin…

 

Before we begin I would like you guys to know the things that you can expect from this post

 

  1. Make simple application like audio call
  2. Basics about Web Audio API
  3. Basics about audio codecs
  4. Problems that you may face while streaming audio
  5. The post will be based on browser only

 

Lets get started

 

When we dive into audio streaming the first thing that pops into our mind is how to get the audio stream. For this specific purpose write down the following code in index.html:

 

<html>

<head>
    <meta content="text/html;charset=utf-8" http-equiv="Content-Type">
    <meta content="utf-8" http-equiv="encoding">
    <title>Audio Record</title>
</head>

<body style="background:#303030;color:#fff;">
    <div id="content"></div>>
</body>

<script src="./audio.js"></script>

<script>

    AudioApi.startToRecord({
        onNewPayload: (buff) => {

            // strema this buff to the internet

            // ...

            // ...

            // lets say we received a data from internet then play it from here
            AudioApi.playStream(buff)
        }
    })

</script>

</html>

similarly write the following code in audio.js file

 

let AudioApi = (() => {

    navigator.getUserMedia = (navigator.getUserMedia ||
        navigator.webkitGetUserMedia ||
        navigator.mozGetUserMedia ||
        navigator.msGetUserMedia);


    let
        channels = 1,
        BUFF_SIZE = 2048,
        samplingRate = 48000,
        frameSize = 4800


    var session = {
        audio: true,
        video: false
    };

    let
        recorder_context = null,
        player_context = null

    function initializeRecorder(stream) {
        var audioContext = window.AudioContext;
        recorder_context = new audioContext();
        var audioInput = recorder_context.createMediaStreamSource(stream);
        var recorder = recorder_context.createScriptProcessor(BUFF_SIZE, 1, 1);
        recorder.onaudioprocess = onRecordData;
        audioInput.connect(recorder);
        recorder.connect(recorder_context.destination);
    }



    function onRecordData(e) {
        LISTENER.onNewPayload(
            convertFloat32ToInt16(e.inputBuffer.getChannelData(0))
        )
    }


    // convert the buffer to float32
    function convertFloat32ToInt16(buffer) {
        let l = buffer.length;
        let buf = new Int16Array(l);
        while (l--) {
            buf[l] = Math.min(1, buffer[l]) * 0x7FFF;
        }
        return buf.buffer;
    }

    // inputArray is the array buffer
    // first convert the array buffer to in16
    // then we need to conver the int16 to float32
    // return float32array
    function int16ToFloat32(inputArray) {

        let int16arr = new Int16Array(inputArray)
        var output = new Float32Array(int16arr.length);
        for (var i = 0; i < int16arr.length; i++) {
            var int = int16arr[i];
            var float = (int >= 0x8000) ? -(0x10000 - int) / 0x8000 : int / 0x7FFF;
            output[i] = float;
        }
        return output;
    }


    // the buff is the array buffer
    // this buff need to be converted to float32
    function makeSomeNoise(buff) {
        if (player_context == null)
            player_context = new window.AudioContext()
        var buffer = player_context.createBuffer(channels, frameSize, samplingRate)
        buffer.getChannelData(0).set(int16ToFloat32(buff)) // set the buffe to the channel
        var source = player_context.createBufferSource() // create a buffer source
        source.buffer = buffer
        source.connect(player_context.destination)
        source.start()

    }


    function onError(e) {
        console.log("onError", e)
    }


    function startToRecord(listener) {
        LISTENER = listener
        navigator.getUserMedia(session, initializeRecorder, onError);
    }

    function playStream(buff) {
        makeSomeNoise(buff)
    }


    function terminate() {

    }





    return {
        startToRecord,
        playStream,
        terminate
    }


})()

 

 

Now open the index.html in your browser.

If everything goes fine and you give permission to access microphone, you must hear your voice callback in the speaker.

 

The Audio Context

 

If you look at the initializeRecorder   function you must see there is something called audio context. In simple words audio context handle the execution of audio processing.  The audio context is responsible for creation of audio nodes as well as processing of the audio data. The audio node is something that has input and outputs that can be connected with other audio nodes. Lets say we need to add gain to our audio signal then we can connect our node to the gain node and then finally get the desired data with gain. The details can be found here. In our case we need input from the microphone, then we need to process it and finally play it. Hence the process goes like this

 

  1. Create the source node by createMediaStreamSource
  2. Connect this node to the audio procession node for processing the audio of microphone so that the data is as we desired. We did this by scriptProcessorNode by calling createScriptProcessor  .We are currently specifying only one channel here.
  3. Process the audio by taking advantage of scriptProcessor callback. The script processor callsonaudioprocess whenever the audio data is ready for processing.

 

The Audio Data

 

Finally we have the raw data from the microphone. This data is the raw pcm data. This data is actually in float32 format. This is where we need to process the data. In order to stream the data we need to convert this to int16 hence we do that from our function convertFloat32ToInt16

 

Once the conversion is done we can stream this data to the internet.

 

Playing the received data

 

The data received from the internet now needs to be converted back to float 32. The playing of the sound is also done with the help of audio context. First we create the buffer source and finally connect it to the context destination for playing. Remember we can add fading effects and filter here too to improve the sound quality.

 

 

Challenges that may be faced

 

  1. The main challenge that we will face while audio streaming is that the JS environment is single threaded and the sound processing is a heavy task. Worker thread might solve this problem to some extent.
  2. The raw PCM data is large and we need to encode and compress it before streaming. This will add more load to the existing single thread environment. Again worker can be used to solve it. I highly suggest you guys to go for opus codec as its open source and some great projects like opus-recorder have implemented it.
  3. The createScriptProcessor is deprecated and AudioWorklet should be used in near future.

 

Feature Image Credit:

Photo by Claus Grünstäudl on Unsplash