Typically the longer the buffer of audio, the more processing you can do. The reason for this is that most of the audio buffering is happening in hardware (silicon I2S controllers). When you are make less software calls for each buffer, there is less processing overhead.
If you capture a larger amount of audio with each time slice, you will be able to record audio without dropping frames and getting overruns.
If your preprocessing code and NN are thread safe, then you can run one thread per CPU and chop your audio buffer into smaller blocks for processing in real time.
One library already setup to do this is the nuclear processing code. The nuclear processing library has an ALSA plugin example, where you can put in your preprcessing code and NN execution at this point in the code. As you aren't doing full duplex processing (output), you can remove this wait for fusion blocking call.
You can then run the plugin using arecord and the device becomes the alsa plugin. To use the plugin you need to define a suitable ALSA slave in your ~/.asoundrc file, like the following :
pcm.NN {
type lfloat;
slave.pcm "NNin";
slave.format FLOAT_LE
}
pcm.NNin {
type NuclearALSAExtPluginTest;
slave.pcm "floatOut";
}
pcm.floatOut {
type lfloat
slave.pcm linearOut;
slave.format S32_LE;
}
pcm.linearOut {this wait for fusion blocking call
type linear;
slave.pcm "hw:0,0";
slave.format S32_LE
}
This means you would execute arecord like so :
arecord -r 16000 -B 100000 -D hw:0 | aplay -r 16000 -B 100000 -D NN
You can also rework the code to use the ALSA external plugin with arecord alone.