Learning roadmap for the audio signal processing beginner

Question

I would like to start learning audio signal processing. There are numerous books online and academic papers all of which seem to skip the fundamentals of the topic.

I would like to know a rough roadmap, so to speak, to follow in order to successfully learn audio signal processing.

I have read that calculus is the first step before starting with signal analysis.

It feels to me like audio signal analysis is only one part of the overall knowledge needed. Where other topics are music theory, audio engineering and programming.

If I may ask people with knowledge on this area to suggest possible steps towards understanding how to analyse and manipulate/create audio signals.

I would agree that a background in calculus (at a minimum) is going to be important for you to have a chance of understanding the mathematics that you might encounter in a signals and systems theory text or course. I would make sure you're up to snuff there first. — Jason R, May 19 '13 at 19:04
These slides might help. They get your though some of the non-mathematical/engineering basics of audio processing and audio programming. http://blog.bjornroche.com/2011/11/slides-from-fundamentals-of-audio.html — Bjorn Roche, May 19 '13 at 22:05

score 18 · Accepted Answer · answered May 19 '13 at 18:21

I recommend having a look at Prof. Julius O. Smith III's Physical Audio Signal Processing. It's available online, or can be purchased through Amazon's print-on-demand service.

In particular, the description in the Book Series Overview might be worthwhile.

enter image description here

ronnied · Answer 2 · 2013-07-04T05:47:25.373

I don't think there is any point diving into the complexity of DFT / FFT / IIR / FIR and wavelets without first understanding what audio is fundamentally and what the various ways of representing audio digitally are.

What is audio in general (in air, not water or other materials):

Audio is composed of sound pressure waves
They cause compression and rarefaction of the air
These waves propagate outwards from the point of source
Waves can interfere with each other causing peaks and troughs
Waves can be absorbed and reflected by materials

How is audio represented electrically:

A microphone and pre-amplifier converts the sound pressure waves into an electrical signal
Typically this signal has both a positive and negative voltage (like AC voltages)
Magnetic tapes store these differences as they appear, hence the term analogue
Saturation occurs when the input signal's strength is equal to the limits of the system (any more increases in voltage cannot be accurately be represented)
Clipping occurs when the input signal is higher than can be represented by the system, so the signal becomes clipped (or capped at the extremities)

How is audio represented digitally:

Audio must first be sampled using an ADC (analog to digital convertor)
Sampling comprises of electrically measuring an audio signal periodically
This period is called the sample rate and it determines the highest frequency that can be represented (nyquist limit)
The nyquist limit is the sample rate / 2 (the closer to get to the limit, the more poorly represented the signal becomes)
The bitrange determines the noise floor, (-96dB for 16 bit vs -48dB for 8 bit)
A single 16 bit sample of audio can be a (signed) value between -32768 to 32767 (this can represent both the negative and positive swing of the analog signal)
There are only 8 Bits allowed per byte (in terms of computer storage) so a 16 bit sample must be represented by at least 2 bytes
The order in which these bytes are stored are referred to as their endian type (big or small)
Stereo samples require a separate sample for each channel, one for left and another for right

What different ways are used to store digital audio:

PCM (pulse code modulated) is the most common uncompressed way of storing audio digitally
Many compression exist to reduce the amount of data used, some are lossless, some are lossy
WAV files are uncompressed and can be mono or stereo (interleaved samples)
MP3 files are compressed, lossy and employ psychoacoustics to achieve very high data compression rates
Even the lowest bit range (1 bit) can be useful depending on their usage, typically gift cards that play audio that is stored as 1 bit

How to become more familiar with audio in the digital realm:

Do do and do more! Download a program such as audacity and create different audio files using different sample rates and bit ranges
Create sine / triangular / square and sawtooth tones and hear the differences
Learn to hear the difference between types such as an 8 bit 10KHz file and a 16 bit 44.1KHz file (CD quality)
Experiment with high-pass / low-pass / band-pass filters and hear the differences
Push signals beyond their saturation limit to understand how clipping affects the audio signal
Apply envelopes to signals if your software has this capability
There is a difference between inharmonic and harmonic distortion, experiment with both
Use a spectrogram (FFT) to see these and other signals to become familiar with them
Use both linear and logarithmic plots to see the differences
Downsample and upsample signals and hear how this affects the audio
Use different dithering methods (when converting bit ranges) and hear the differences

This will hopefully give you a sense of what digitally represented audio is and what the differences sound like prior to attempting any DSP. It's always easier to know that something is wrong with your FFT analysis if you can recognise that you have inputed an 8 bit signal vs a 16 bit signal for example or that the sample rate has been corrupted by a faulty miscalculation in a transform.

Thanks for the answer. I am aware of these things though and would like to get into the dsp coding side of it now. — some_id, Jul 31 '13 at 18:42

Learning roadmap for the audio signal processing beginner

2 Answers2