Filtering vs Compression Paradox

Question

Filtered signal leads to bigger compressed files.

#1. Original Situation

I have an original signal as a column data matrix n channels data x:mxn (single), with m=120019 the numer of samples and n=15 the number of channels.

Also, i have the filtered signal as a filtered column data matrix x:mxn (single).

The original data is mainly random, centered at zero, from sensor pickups.

Under MATLAB, am using save with no options, butter as highpass filter, and single for casting after filtering.

save essentially apply a GZIP level-3 compression over a binary HDF5 format, hence we could assume the filesize is a good estimator of the information contents, i.e. maximum for a random signal, and close to zero for a constant signal.

Saving the original signal creates a 2MB file,
Saving the filtered signal creates a 5MB file (?!).

#2. Question

How it is possible the filtered signal has a bigger size, considering the filtered signal has less information, removed by the filter?

#3. Simple Example

A simple example:

n=120019; m=15;t=(0:n-1)'; 
x=single(randn(n,m));
[b,a]=butter(2,10/200,'high'); 
 xf=filter(b,a,x);
save('x','x'); save('xf','xf');

creates 6MB files, both for for the original and filtered signal, which is bigger than the previous values due to using pure random data.

In a sense, indicating that the filtered signal is more random than the filtered signal (?!).

#4. Evaluative Example

Consider the following:

A filter created from a random signal $x_r$ from gaussian noise $\sim N(0,1)$, and a constant signal $x_c$ equal to $1$.
Disregard the data type, i.e. let's use only double,
Disregard the data sizes, i.e. let's use one column data vector of 1MB, $n=125000$, $m=1$.
Lets consider the $a$ parameter as the Randomness Index for testing: $x=\alpha x_r+(1-\alpha)x_c$, meaning $\alpha=1$ is fully random and $\alpha=0$ fully constant.
Consider a highpass butterworth filter with $w_n=0.5$.

The following code:

%% Data
n=125000;m=1;
t=(0:n-1)';
[hb,ha]=butter(2,0.5,'high');
d=100;
a=logspace(-6,0,d);
xr=randn(n,m);xc=ones(n,m);
b=zeros(d,2);
for i=1:d
    x=a(i)*xr+(1-a(i))*xc;
    xf=filter(hb,ha,x);
    save('x1.mat','x'); save('x2.mat','xf');
    b1=dir('x1.mat'); b2=dir('x2.mat');
    b(i,1)=b1.bytes/1024;
    b(i,2)=b2.bytes/1024;
    i
end
%% Plot
semilogx(a,b);
title('Data Size for Filtered Signals');
legend({'original','filtered'},'location','southeast');
xlabel('Random Index \alpha');
ylabel('FIle Size [kB]');
grid on;

With the following chart as result:

This simulation reproduces the condition of the filtered signal always having a notorious bigger size than the original signal, which contradicts the fact that a filtered signal has less information, removed by the filter.

I think your question is more about the compression algorithm than anything else. Save the two files with the -nocompression option then go check the bit patterns you are unwittingly generating. My guess is that you random signal actually contains significant repetitions which compress well, while the filtered version does not. Interesting nonetheless :) — zeFrenchy, Aug 12 '17 at 08:32
With no compression, all signals have the same size, 1MB, as the length and data types are all the same. I will check though. I am blindly assuming the compression works as information, so i will put an additional evaluative example twist for checking this "information" aspect... — Brethlosze, Aug 12 '17 at 12:47

doubleE · Answer 1 · 2017-08-12T06:30:11.847

5

+1 on very interesting and insightful experiment.

Some thoughts:

It's not true that filtered signal has less information. It depends on your input signal, filter type, and cut-off frequency.
When you high-pass the noisy signal, you're removing the slowly changing components. That makes your signal composed of 'more frequently changing random numbers', thus more random. Of course, that depends if your input signal contains high frequencies or not. Your input is noise, so contains every high frequency. But if your input is a more ordered signal, it will lose much of its energy after a certain HP cut-off frequency, the output becomes near zero, less random, less size. I think if you increase the cut-off frequency of your HP filter pretty high, after a certain point, the file size will decrease.
One other experiment would be to pass the signal through a LP filter with a low cutoff frequency and see the difference.
Based on the same theory in 1., you're high-passing your signal, essentially removing the DC part, xc, and leaving it with noise xr.

edited Aug 12 '17 at 06:30

answered Aug 12 '17 at 06:08

doubleE

273
3
13

2

Information-theoretically, your 1. is at least half-wrong. The filtered signal must contain less (or at most, the same) information as the unfiltered. – Marcus Müller Aug 12 '17 at 09:22
2

@MarcusMüller I definetely agree with you on this (obvious statement) but I have the following concern: had you interchanged the roles of the filter's impulse response and the input random signal (i.e., the (deterministic) impulse response becomes the input to the filter with a random impulse response now) could we still say that the information at the output is less than the information at the input ? – Fat32 Aug 12 '17 at 10:09
1

@Fat32 that is an interesting angle! True point. In this particular case I'd argue that if we consider the LPF as the information-bearing signal, then we'd find it contains very little info at all (being very correlated, by design, and rather short). – Marcus Müller Aug 12 '17 at 10:19
@Arash For 1. having only noise, i would expect the lowpass and highpass signals data sizes could be "additive", and if my cutoff is 0.5, i would even dare to believe i am "halving" the sizes!; both assumptions clearly false now. I will check with a LP and paste the results. – Brethlosze Aug 12 '17 at 12:56
1

@Fat32 this is a nice suggestion. If i compress the FT of the signal, for a lossless case, i should still have the same size! (disregarding the fact that some parts of the spectrum could lead to less valuable, easy to discard, information). If not, we would have discovered a better compression algorithm, which i honestly doubt :). So i will prepare a second evaluatice example with this approach. – Brethlosze Aug 12 '17 at 13:02
@MarcusMüller i will check a BPF case too, and see again, what happens in the frequency... the same example driven with the LPF option reduces sizes but just in a very minor percentage. – Brethlosze Aug 12 '17 at 13:13
1

@MarcusMüller It can't be generalized that simply. You should define things to make that statement true. Assume you have a system that randomizes its input. When input is DC, it makes it random, So your output has more entropy.(It's not difficult to imagine that, just a communication channel that adds noise to the input would do this.) For LTI systems, we know how they treat noise, so that's a different topic. Information theory doesn't base its results on whether a system is LTI. Still I am not an expert to be sure, but I think that is not true. – doubleE Aug 13 '17 at 03:06
1

@hyprfrcb that would be true if information is distributed uniformly over the bandwidth of signal...which I don't think that's true for all signals. As more rapid change leads to more entropy, I think the information distribution is not symmetric over the bandwidth. Just a thought. – doubleE Aug 13 '17 at 03:13
Remember all this comes from real data, so the random is real. And the random is more or less flat over the portion of spectrum, due the sensors have flat spectrum response. Hence we are then confirming the compression algorithm is, lets say, perfectible? Or there are something else we are not understanding well? – Brethlosze Aug 13 '17 at 05:01
1

@Arash I'm sorry, information-theoretically, the cross-info between random source and sink is always greater or equal without filter than with. That's a fundamental (if not the fundamental) statement of information theory. Entropy, which makes things impossible to compress, has to come from somewhere – either the original signal, or the system filtering it (Fat32's angle). So, yes it can be generalized that simply :) – Marcus Müller Aug 13 '17 at 10:02

Royi · Answer 2 · 2017-08-12T13:26:32.700

3

I would check 2 things:

If the filter applied is Low Pass Filter or a different filter. If it is a filter which amplifies the noise, the result is reasonable.
It seems that you use butter() in a form which generates High Pass Filter. Since the input signal is composed of noise, the High Pass Filter amplify it and causes to less compressible file. For instance, try [hb, ha] = butter(2, 0.5, 'low'); where it should support better compression of data (Suppression of noise). If you want to go even farther use [hb, ha] = butter(2, 0.1, 'low');.
Verify that the output of the filter command is single as well. I think that since your filter is double the output is double hence the size of the signal is multiplied. In your code, replace xf = filter(hb, ha, x); with xf = single(filter(hb, ha, x));. What are the results now?

edited Aug 12 '17 at 13:26

answered Aug 12 '17 at 10:47

Royi

19,608
4
197
238

If the filter is lowpass, the size is only slightly less. In the evaluative example everything is double. I will update the results with the different filter cases. – Brethlosze Aug 12 '17 at 12:33
1

@hyprfrcb, Try the same with butter(2, 0.5, 'low');. What happens then? – Royi Aug 12 '17 at 13:19
The sizes decrease but in a very low percentage – Brethlosze Aug 12 '17 at 13:20
1

So the problem is solved. You're using High Pass filter which amplifies the noise hence you got larger file as noise is less compressible. Enjoy... – Royi Aug 12 '17 at 13:25
1

You can try [hb, ha] = butter(2, 0.1, 'low'); to see the file size gets even smaller. – Royi Aug 12 '17 at 13:27
What do not make sense to me is that the HPF gives a bigger size, meaning this is "creating" more information. If this is wrong, then the compression algorithm is very imperfect for signals, and an "big" improve should be done for making room of these effects. I will make a new explicative example. Clearly 1. the information in a random signal is finite, or 2. there is plenty room for improve the compression algorithm. – Brethlosze Aug 12 '17 at 13:38
1

High Pass applied on Noise usually deteriorate the SNR in a signal. This is what you did above. The High Pass filter amplified the noise energy which means the data is less compressible. – Royi Aug 12 '17 at 13:42
Note that in the first example, both the pure random and the HPF random are 6MB. There is no "random" amplification. We could say that they are at the "maximum randomness" (?!). If a signal is "below max. randomness", a HPF could "enhance" "randomness" and "create" more information as i.e. "numerical randomness" (?!). This from the compression point of view. Hence the compression simply dont handle random in a good way. Note if i decompose a 2MB, this will turn into a HPF 5MB + a LPF 1.8MB (?!). – Brethlosze Aug 12 '17 at 14:08
1

Random signal which is filtered by an LPF filter creates correlation between its samples, hence it is compressible to an higher level. It doesn't work with HPF which operates on random signal. – Royi Aug 12 '17 at 14:14
1

@Royi I'm having my problems with the distinction you make between HPF and LPF; mathematically: Multiply your HPF'ed signal with [+1,-1,+1,-1,…]; tadaaaah, same information content, but low pass signal. – Marcus Müller Aug 12 '17 at 16:54
@Brethlosze, Could you please mark my answer? – Royi Sep 19 '22 at 11:47

Filtering vs Compression Paradox

2 Answers2