How to Average Complex Responses (and Justification)?

Question

I am developing software that calculates the response of a system by comparing the FFT of input and output signals. The input and output signals are divided into windows and, for each window, the signals are median-subtracted and multiplied by a Hann function. The instrument response for that window is then the ratio of the FFTs of the processed data.

I believe the above is standard procedure, although I may be describing it poorly. My problem comes in how to combine the responses from the multiple windows.

As far as I can see, the correct approach is to average the complex values, across all windows. The amplitude and phase response are then the amplitude and phase of the average, complex value at each frequency:

av_response = sum_windows(response) / n
av_amplitude = sqrt(real(av_response)**2 + imag(av_response)**2)
av_phase = atan2(imag(av_response), real(av_response))

with implicit loops over frequency bins.

But I have been asked to change this to calculate amplitude and phase in each window first, and then average the amplitudes and phases across all windows:

amplitude = sqrt(real(response)**2 + imag(response)**2)
av_amplitude = sum_windows(amplitude) / n
phase = atan2(imag(response), real(response))
av_phase = sum_windows(phase) / n

I have argued that this is incorrect because averaging angles is "just wrong" - the average of 0 and 360 degrees is 180, for example, but the people I am working with responded by saying "OK, we will only display amplitude".

So my questions are:

Am I correct in thinking that the second approach is generally incorrect for amplitudes too?
If so, are there any exceptions that may be relevant, and which may explain why the people I am working with prefer the second method? For example, it looks like the two approaches will agree as the noise becomes small, so perhaps this is an accepted approximation for low noise?
If the second approach is incorrect, are there any convincing, authoritative references that I can use to show this?
If the second approach is incorrect, are there any good, easy to understand examples that show this for amplitude (as the average of 0 and 360 degrees does for phase)?
Alternatively, if I am incorrect, what would be a good book for me to educate myself better?

I have tried to argue that the average of -1 1 1 -1 1 -1 -1 should be zero rather than 1, but that was unconvincing. And while I think I could, with time, construct an argument based on max likelihood estimation given a particular noise model, it is not the kind of reasoning that the people I am working with will listen to. So, if I am not wrong, I need either a powerful argument from authority or an "obvious" demonstration.

[I tried to add more tags, but can't find relevant ones and can't define new ones as a new user - sorry]

the response looks smoother when plotted with the second method. i think this is because, for the cases looked at, there is no significant signal (at higher f), while the second approach forces a signal "to appear" from the noise. also, various political/communication issues as you might guess. — andrew cooke, Apr 27 '12 at 19:02
Have you tried providing some test cases? Take random data and filter it through some filters with known frequency response. Verify that the transfer function estimate converges to the known transfer function. — nibot, Apr 27 '12 at 19:05
no. i haven't. that's a good suggestion. thanks. if presented well, i could see that being convincing. — andrew cooke, Apr 27 '12 at 19:06

nibot · Accepted Answer · 2012-04-27T20:22:39.510

Transfer function estimation is usually implemented slightly differently than the method you describe.

Your method computes

$$\left\langle \frac{\mathcal{F}[y]}{\mathcal{F}[x]} \right\rangle$$

where $\langle$angle brackets$\rangle$ represent averages taken over data segments, and a windowing function is applied to each data segment before taking the Fourier transform ($\mathcal{F}$).

A more typical implementation will compute the cross spectral density of x and y divided by the power spectral density of x:

$$\frac{\langle \mathcal{F}[y] \cdot \mathcal{F}[x]^* \rangle}{\langle|\mathcal{F}[x]|^2\rangle} = \frac{\langle \mathcal{F}[y] \cdot \mathcal{F}[x]^* \rangle}{\langle\mathcal{F}[x]\cdot\mathcal{F}[x]^*\rangle}$$

Where $\cdot$ represents a pointwise product, and $*$ the complex conjugate.

I believe this is to reduce the effect of data segments where bins of $\mathcal{F}[x]$ are excessively small.

Incoherent estimation

Your employer has suggested that you estimate the transfer function using

$$\frac{|\langle \mathcal{F}[y]|\rangle}{\langle |\mathcal{F}[x]|\rangle}$$

This will work, but has two big disadvantages:

You don't get any phase information.
If your measurements of the input $x$ and output $y$ have any additional noise, then the transfer function estimation will not be correct.

Your method and the method I described circumvent these problems by using coherent averaging.

References

The general idea of using overlapped, averaged segments to compute power spectral densities is known as Welch's method. I believe the extension to using this to estimate transfer functions is also often known as Welch's method, although I'm not sure if it is mentioned in Welch's paper. Looking up Welch's paper might be a valuable resource. A useful monograph on the subject is Bendat and Piersol's book, Random Data: Analysis and Measurement Procedures.

Validation

To validate your software, I suggest applying several test cases, where you generate Gaussian white noise and feed it through a digital filter with a known transfer function. Feed the inputs and outputs into your transfer function estimation routine and verify that the estimate converges to the known value of the transfer function.

You can use any length. The length determines the resolution, and, implicitly (given a fixed amount of data to work with), the number of averages. Longer fft = better resolution but also larger errors due to having fewer averages. — nibot, Apr 28 '12 at 22:00
ok, another difference is that you have <F(y)F(x)>/<F(x)F(x)> while Phonon has <F(y)><F(x)>/(<F(x)><F(x)>) afaict :o( — andrew cooke, May 04 '12 at 16:49
There's no point to calculating <F(y)><F(x)>/(<F(x)><F(x)>), since the <F*(x)>'s will cancel immediately. I think it's correct as I've written it. — nibot, May 04 '12 at 17:08
http://isti.bitbucket.org/2012/05/11/instrument-response.html — andrew cooke, May 11 '12 at 22:47

score 13 · Answer 2 · edited Jul 17 '19 at 11:30

13

Welcome to Signal Processing!

You're absolutely right. You cannot simply average DFT magnitudes and phases separately, especially phases. Here's a simple demonstration:

Let $z = a+bi$. By definition, magnitude $|z|$ and phase $\angle z$ of $z$ are:

$$|z| = \sqrt{a^2 + b^2}$$ $$\angle z = \tan ^{-1} \left( \frac{b}{a} \right)$$

Average $z$ of two complex values $z_1$ and $z_2$ is

$$z = \frac{ z_1 + z_2 } {2} = \frac{ a_1+b_1i + a_2 + b_2i } {2} = \frac{ (a_1+a_2) + (b_1 + b_2)i } {2}$$

In this case,

$$|z| = \sqrt{\frac{(a_1+a_2)^2}{4} + \frac{(b_1+b_2)^2}{4}} = \frac{1}{2}\sqrt{(a_1+a_2)^2 + (b_1+b_2)^2}\neq \frac{\sqrt{a_1^2 + b_1^2}+\sqrt{a_2^2 + b_2^2}}{2}$$

Also,

$$\angle z = \frac{\tan ^{-1} \left( \frac{b_1}{a_1} \right) + \tan ^{-1} \left( \frac{b_2}{a_2} \right)}{2} \neq \tan ^{-1} \left( \frac{2(b_1+b_2)}{2(a_1+a_2)} \right)$$

If you compare the degree to which these inequalities hold, you can say that the approximation for $|z|$ is off by a quadratic term, while approximation for $\angle z$ in completely meaningless.

Now, in order to do what you're trying to do I suggest the following. Theoretically, you can find an impulse response of a system by dividing DFT of the output by DFT of the input. However, in presence of noise you're going to get very strange results. A slightly better way to do it would be to use dual-channel FFT impulse response estimation, which goes as follows (derivation not provided here, but you can find it online).

Let $G_i(f) = \dfrac{ F^1_i(f) + F^2_i(f) + \cdots + F^N_i(f) }{N}$, where $F^k_i(f)$ is the DFT of $k$-th (hence superscript $k$) windowed chunk of input signal (hence subscript $i$ for input). Similarly, for output signal, let $G_o(f) = \dfrac{ F^1_o(f) + F^2_o(f) + \cdots + F^N_o(f) }{N}$. You can see that the $G$ signals are simply the averages of the windowed DFTs. Then the statistical dual-channel FFT approximation $\hat{H}(f)$ for the impulse response $H(f)$ is given by

$$\hat{H}(f) = \frac{G_o(f)G_i^*(f)}{|G_i(f)|^2}$$

where the $(\cdot)^*$ stands for complex conjugation (flip the sign of all your imaginary parts).

edited Jul 17 '19 at 11:30

Alex

3
2

answered Apr 27 '12 at 19:55

Phonon

5,216
5
37
62

2

thanks; i wasn't sure whether to vote this one or nibot's as best answer - i think they are advocating the same process, so went with the book recommendation, but if i had two votes would have included this too... – andrew cooke Apr 28 '12 at 11:54
1

@andrewcooke Yes, they both are advocating exactly the same thing. I hope this clears things up for you and you colleagues. – Phonon Apr 28 '12 at 16:25
it's been a huge help for me (thanks again). on monday i will suggest that i (1) implement the method suggested and (2) do comparisons with known (synthetic) data for all three. then hopefully the best approach will win :o) – andrew cooke Apr 28 '12 at 16:26
@Phonon What FFT lengths are we using to compute the FFTs here? length_of_signal + max_length_of_channel + 1? – Spacey Apr 28 '12 at 21:58
@Mohammad It has to be at least twice the length of the delay you're expecting to find. This is due to circular symmetry of the DFT, so you will get both causal and non-causal delay values in your result. – Phonon Apr 30 '12 at 13:23
sorry, one more question. your expression differs from that of nibot, in that you are taking the conjugate of the output, while nibot takes the conjugate of the input. my sense of symmetry says that you're wrong, but that's only a guess and the book is still in the post, so a clarification would be appreciated... thanks! ps implementing this now; hope to post a plot of the results later today. – andrew cooke May 04 '12 at 13:08
ok, another difference is that you have <Gi>/(<Gi>) while nibot has <GoGi>/<GiGi> afaict - your version reduces to /. – andrew cooke May 04 '12 at 16:50
@andrewcooke Yes, you're right. I switched the two variables. – Phonon May 04 '12 at 19:34
http://isti.bitbucket.org/2012/05/11/instrument-response.html – andrew cooke May 11 '12 at 22:47
@andrewcooke Great material! – Phonon May 14 '12 at 12:44

score 3 · Answer 3 · answered Apr 27 '12 at 20:04

3

This is a difference between coherent and incoherent averaging of FFT spectra. Coherent averaging is more likely to reject random noise in the analysis. Incoherent is more likely to accentuate random noise magnitudes. Which of these is more important to your result report?

answered Apr 27 '12 at 20:04

hotpaw2

35,346
9
47
90

if they give different results i guess i want an unbiased estimate. is either unbiased? – andrew cooke Apr 27 '12 at 20:12

How to Average Complex Responses (and Justification)?

3 Answers3

Linked