Optimization of execution time of audio capture call in c/c++ (ALSA)

Question

I am working on a Audio Deep Learning Project. In this C++ project I need to capture real-time audio and then do pre-processing and the pre-processed output is fed to the Deep Learning Network.

I am fetching 50 ms audio data in each call at 16000 sampling rate, i.e. 800 samples are fetched in the audio capture call.

Sometimes this call takes more than 50 ms, my processing code is taking ~30 ms.

If the audio-capture + processing takes more than 50 ms, is that mean I lost some audio data?

Can the audio capture call be optimized?

Implementation Info:- I have the working code on raspberry pi and my processing code is armv7 optimized.

Execution time is measured using std::chrono in c++ using high_resolution_clock.

it's not clear to me what you mean by "sometimes this call takes more than 50 ms". 800 samples at a sampling rate of 16000 corresponds to exactly 50 ms of audio regardless of how much time it's needed to fill the buffer(s). — dsp_user, Mar 03 '20 at 09:25
You are most likely using generic audio hardware. You might find this and this useful — A_A, Mar 03 '20 at 10:06

score 2 · Answer 1 · answered Mar 03 '20 at 11:54

Capture audio into a FIFO, and process the samples in that FIFO in a separate thread. That way, timing variations in your processing thread won’t block the audio input.

Very commonly the audio input FIFO is implemented using a lock-free ring buffer so it never blocks. A separate processing thread can poll the buffer at some suitable fixed rate (video frame rate or network data rate for a given MTU size, etc.)

score 1 · Answer 2 · answered Apr 02 '20 at 11:10

Typically the longer the buffer of audio, the more processing you can do. The reason for this is that most of the audio buffering is happening in hardware (silicon I2S controllers). When you are make less software calls for each buffer, there is less processing overhead.

If you capture a larger amount of audio with each time slice, you will be able to record audio without dropping frames and getting overruns.

If your preprocessing code and NN are thread safe, then you can run one thread per CPU and chop your audio buffer into smaller blocks for processing in real time.

One library already setup to do this is the nuclear processing code. The nuclear processing library has an ALSA plugin example, where you can put in your preprcessing code and NN execution at this point in the code. As you aren't doing full duplex processing (output), you can remove this wait for fusion blocking call.

You can then run the plugin using arecord and the device becomes the alsa plugin. To use the plugin you need to define a suitable ALSA slave in your ~/.asoundrc file, like the following :

pcm.NN {
  type lfloat;
  slave.pcm "NNin";
  slave.format FLOAT_LE
}

pcm.NNin {
  type NuclearALSAExtPluginTest;
  slave.pcm "floatOut";
}

pcm.floatOut {
  type lfloat
  slave.pcm linearOut;
  slave.format S32_LE;
}

pcm.linearOut {this wait for fusion blocking call
  type linear;
  slave.pcm "hw:0,0";
  slave.format S32_LE
}

This means you would execute arecord like so :

arecord -r 16000 -B 100000 -D hw:0 | aplay -r 16000 -B 100000 -D NN

You can also rework the code to use the ALSA external plugin with arecord alone.

Optimization of execution time of audio capture call in c/c++ (ALSA)

2 Answers2