Fast and memory efficient moving average calculation

Question

I'm looking for a time and memory efficient solution to calculate a moving average in C. I need to avoid dividing because I'm on a PIC 16 which has no dedicated division unit.

At the moment, I just store all values in a ring buffer and simply store and update the sum each time a new value arrives. This is really efficient, but unfortunately uses most of my available memory...

I don't think there's any more space efficient way to do this. — Rocketmagnet, Apr 20 '12 at 07:44
@JobyTaffey well, it's a quite widely used algorithm on control systems, and it requires dealing with limited hardware resources. So I think he'll find more help here than on SO. — clabacchio, Apr 20 '12 at 11:52
@Joby: There are some wrinkles about this question that are relevant to small resource-limited systems. See my answer. You'd do this very differently on a large system like the SO folks are used to. This has come up a lot in my experience of designing electronics. — Olin Lathrop, Apr 20 '12 at 12:35
I agree. This is quite appropriate for this forum, as it relates to embedded systems. — Rocketmagnet, Apr 20 '12 at 12:48

score 62 · Accepted Answer · edited Mar 29 '22 at 01:44

As others have mentioned, you should consider a IIR (infinite impulse response) filter rather than the FIR (finite impulse response) filter you are using now. There is more to it, but at first glance FIR filters are implemented as explicit convolutions and IIR filters with equations.

The particular IIR filter I use a lot in microcontrollers is a single pole low pass filter. This is the digital equivalent of a simple R-C analog filter. For most applications, these will have better characteristics than the box filter that you are using. Most uses of a box filter that I have encountered are a result of someone not paying attention in digital signal processing class, not as a result of needing their particular characteristics. If you just want to attenuate high frequencies that you know are noise, a single pole low pass filter is better. The best way to implement one digitally in a microcontroller is usually:

FILT <-- FILT + FF(NEW - FILT)

FILT is a piece of persistant state. This is the only persistant variable you need to compute this filter. NEW is the new value that the filter is being updated with this iteration. FF is the filter fraction, which adjusts the "heaviness" of the filter. Look at this algorithm and see that for FF = 0 the filter is infinitely heavy since the output never changes. For FF = 1, it's really no filter at all since the output just follows the input. Useful values are in between. On small systems you pick FF to be 1/2^N so that the multiply by FF can be accomplished as a right shift by N bits. For example, FF might be 1/16 and the multiply by FF therefore a right shift of 4 bits. Otherwise this filter needs only one subtract and one add, although the numbers usually need to be wider than the input value (more on numerical precision in a separate section below).

I usually take A/D readings significantly faster than they are needed and apply two of these filters cascaded. This is the digital equivalent of two R-C filters in series, and attenuates by 12 dB/octave above the rolloff frequency. However, for A/D readings it's usually more relevant to look at the filter in the time domain by considering its step response. This tells you how fast your system will see a change when the thing you are measuring changes.

To facilitate designing these filters (which only means picking FF and deciding how many of them to cascade), I use my program FILTBITS. You specify the number of shift bits for each FF in the cascaded series of filters, and it computes the step response and other values. Actually I usually run this via my wrapper script PLOTFILT. This runs FILTBITS, which makes a CSV file, then plots the CSV file. For example, here is the result of "PLOTFILT 4 4":

The two parameters to PLOTFILT mean there will be two filters cascaded of the type described above. The values of 4 indicate the number of shift bits to realize the multiply by FF. The two FF values are therefore 1/16 in this case.

The red trace is the unit step response, and is the main thing to look at. For example, this tells you that if the input changes instantaneously, the output of the combined filter will settle to 90% of the new value in 60 iterations. If you care about 95% settling time then you have to wait about 73 iterations, and for 50% settling time only 26 iterations.

The green trace shows you the output from a single full amplitude spike. This gives you some idea of the random noise suppression. It looks like no single sample will cause more than a 2.5% change in the output.

The blue trace is to give a subjective feeling of what this filter does with white noise. This is not a rigorous test since there is no guarantee what exactly the content was of the random numbers picked as the white noise input for this run of PLOTFILT. It's only to give you a rough feeling of how much it will be squashed and how smooth it is.

PLOTFILT, maybe FILTBITS, and lots of other useful stuff, especially for PIC firmware development is available in the PIC Development Tools software release at my Software downloads page.

In addition, a web-based port of PLOTFLIT can be found here.

Added about numerical precision

I see from the comments and now a new answer that there is interest in discussing the number of bits needed to implement this filter. Note that the multiply by FF will create Log₂(FF) new bits below the binary point. On small systems, FF is usually chosen to be 1/2^N so that this multiply is actually realized by a right shift of N bits.

FILT is therefore usually a fixed point integer. Note that this doesn't change any of the math from the processor's point of view. For example, if you are filtering 10 bit A/D readings and N = 4 (FF = 1/16), then you need 4 fraction bits below the 10 bit integer A/D readings. One most processors, you'd be doing 16 bit integer operations due to the 10 bit A/D readings. In this case, you can still do exactly the same 16 bit integer opertions, but start with the A/D readings left shifted by 4 bits. The processor doesn't know the difference and doesn't need to. Doing the math on whole 16 bit integers works whether you consider them to be 12.4 fixed point or true 16 bit integers (16.0 fixed point).

In general, you need to add N bits each filter pole if you don't want to add noise due to the numerical representation. In the example above, the second filter of two would have to have 10+4+4 = 18 bits to not lose information. In practise on a 8 bit machine that means you'd use 24 bit values. Technically only the second pole of two would need the wider value, but for firmware simplicity I usually use the same representation, and thereby the same code, for all poles of a filter.

Usually I write a subroutine or macro to perform one filter pole operation, then apply that to each pole. Whether a subroutine or macro depends on whether cycles or program memory are more important in that particular project. Either way, I use some scratch state to pass NEW into the subroutine/macro, which updates FILT, but also loads that into the same scratch state NEW was in. This makes it easy to apply multiple poles since the updated FILT of one pole is the NEW of the next one. When a subroutine, it's useful to have a pointer point to FILT on the way in, which is updated to just after FILT on the way out. That way the subroutine automatically operates on consecutive filters in memory if called multiple times. With a macro you don't need a pointer since you pass in the address to operate on each iteration.

Code Examples

Here is a example of a macro as described above for a PIC 18:

//////////////////////////////////////////////////////////////////////////////// // // Macro FILTER filt // // Update one filter pole with the new value in NEWVAL. NEWVAL is updated to // contain the new filtered value. // // FILT is the name of the filter state variable. It is assumed to be 24 bits // wide and in the local bank. // // The formula for updating the filter is: // // FILT <-- FILT + FF(NEWVAL - FILT) // // The multiply by FF is accomplished by a right shift of FILTBITS bits. // /macro filter /write dbankif lbankadr movf [arg 1]+0, w ;NEWVAL <-- NEWVAL - FILT subwf newval+0 movf [arg 1]+1, w subwfb newval+1 movf [arg 1]+2, w subwfb newval+2

/write /loop n filtbits ;once for each bit to shift NEWVAL right rlcf newval+2, w ;shift NEWVAL right one bit rrcf newval+2 rrcf newval+1 rrcf newval+0 /endloop

/write movf newval+0, w ;add shifted value into filter and save in NEWVAL addwf [arg 1]+0, w movwf [arg 1]+0 movwf newval+0

     movf    newval+1, w
     addwfc  [arg 1]+1, w
     movwf   [arg 1]+1
     movwf   newval+1

     movf    newval+2, w
     addwfc  [arg 1]+2, w
     movwf   [arg 1]+2
     movwf   newval+2

/endmac

And here is a similar macro for a PIC 24 or dsPIC 30 or 33:

//////////////////////////////////////////////////////////////////////////////// // // Macro FILTER ffbits // // Update the state of one low pass filter. The new input value is in W1:W0 // and the filter state to be updated is pointed to by W2. // // The updated filter value will also be returned in W1:W0 and W2 will point // to the first memory past the filter state. This macro can therefore be // invoked in succession to update a series of cascaded low pass filters. // // The filter formula is: // // FILT <-- FILT + FF(NEW - FILT) // // where the multiply by FF is performed by a arithmetic right shift of // FFBITS. // // WARNING: W3 is trashed. // /macro filter /var new ffbits integer = [arg 1] ;get number of bits to shift

/write /write " ; Perform one pole low pass filtering, shift bits = " ffbits /write " ;"

     sub     w0, [w2++], w0 ;NEW - FILT --&gt; W1:W0
     subb    w1, [w2--], w1

     lsr     w0, #[v ffbits], w0 ;shift the result in W1:W0 right
     sl      w1, #[- 16 ffbits], w3
     ior     w0, w3, w0
     asr     w1, #[v ffbits], w1

     add     w0, [w2++], w0 ;add FILT to make final result in W1:W0
     addc    w1, [w2--], w1

     mov     w0, [w2++]  ;write result to the filter state, advance pointer
     mov     w1, [w2++]

/write /endmac

Both these examples are implemented as macros using my PIC assembler preprocessor, which is more capable than either of the built-in macro facilities.

+1 -- right on the money. The only thing I'd add, is that moving average filters do have their place when performed synchronously to some task (like producing a drive waveform to drive an ultrasound generator) so that they filter out harmonics of 1/T where T is the moving average time. — Jason S, Apr 20 '12 at 12:45
You write "The red trace is the unit impulse response [...]". Shouldn't it be "step response" like the labeling of your graph? -- Anyway, nice answer +1 — PetPaulsen, Apr 20 '12 at 13:15
@Jason: Yes, I agree that box filters have their uses due to the comb nature of the frequency response. But, most uses of them I see is as simple low pass filters to reduce high frequencies in general. I don't know why, but a box filter seems to be the knee jerk reaction of those that didn't pay attention in digital signal processing class. Hearing words like "moving average" with no mention of "convolution", "box filter", or "FIR", is usually a warning of this case. — Olin Lathrop, Apr 20 '12 at 13:21
@Olin -- yup, I'm with you there. I don't want to know how much of the circuitry/software in products I rely on has been designed that way, e.g. just use method X without really understanding why. — Jason S, Apr 20 '12 at 13:35
Nice answer, but just two things. First: it's not necessarily lack of attention that leads to the choice of a wrong filter; in my case, I've never been taught about the difference, and the same applies to non-graduated people. So sometimes it's just ignorance. But the second: why do you cascade two first-order digital filters instead of using a higher order one? (just to understand, I'm not criticizing) — clabacchio, Apr 20 '12 at 13:51
@clabacchio: Two cascaded single pole filters is a higher order one. Write the algorithm for what you consider higher order, and you'll see the same number of operations. — Olin Lathrop, Apr 20 '12 at 14:00
Fair enough; but I was seeing it as two cascaded IIR, and in my mind it implies more complexity because of redundancy...where am I wrong? — clabacchio, Apr 20 '12 at 14:01
two cascaded single pole IIR filters are more robust to numerical issues, and easier to design, than a single 2nd-order IIR filter; the tradeoff is that with 2 cascaded stages you get a low Q (= 1/2?) filter, but in most cases that's not a huge deal. — Jason S, Apr 20 '12 at 14:45
@clabacchio: Another issue I should have mentioned is firmware implementation. You can write a single pole low pass filter subroutine once, then apply it multiple times. In fact I usually write such a subroutine to take a pointer in memory to the filter state, then have it advance the pointer so that it can be called in succession easily to realize multi-pole filters. — Olin Lathrop, Apr 20 '12 at 15:03
@OlinLathrop I think there is a problem with your description. With the shift inside the FILT calculation, any time there is an inflection, the output will stay constant until (NEW-FILT) > 16 or < -16. Won't this leave odd flat spots at inflections? I think the calculation and FILT should be left scaled by 16 and the shift done after the calculation to get output. Something like FILT <- FILT + FF(FILT)-NEW and then output FF(FILT) — C. Towne Springer, Dec 31 '13 at 20:39
@user: I implement it as written, but add fraction bits to FILT as appropriate. I never said FILT is integer. It is usually fixed point. To not loose any data, you have to add Log2(1/FF) bits each filter pole. In practise you can make a tradeoff with where the noise floor is or what accuracy you care about. Note that the error you mention when using too few fraction bits is not accumulating since NEW-FILT gives you the complete error each time. — Olin Lathrop, Dec 31 '13 at 20:55
@OlinLathrop To be precise, you wrote shift right by 4 bits on small systems, but I get your point. It works fine on fixed point, and certainly much smaller and faster than a circular buffer. — C. Towne Springer, Dec 31 '13 at 22:59
@user: I glossed over this detail by saying "the numbers usually need to be wider than the input value". In any case, it seems we basically agree. Note that fixed point doesn't necessarily mean using more bits. On a 16 bit machine you can shift a 10 bit A/D reading left 5 bit to view a word as 11.5 fixed point. All the math operations work regardless of where you think the binary point is. — Olin Lathrop, Jan 01 '14 at 00:08
@mars: So sorry you're annoyed. I'll refund your full payment. Oh, wait... In any case, I just checked and both PLOTFILT and CSVPLOT are included in the Full Runtime release on the page linked to above. From your description I can't tell what exactly went wrong. Did the installation program succeed? Obviously you can't expect the software to run correctly if it didn't install correctly or you failed to follow any of the directions. — Olin Lathrop, Dec 02 '14 at 14:49
I have rewritten the filter analysis routine in Python. Anyone interested can get the code here: https://github.com/Miceuz/plotfilt — miceuz, Mar 10 '15 at 12:22

score 20 · Answer 2 · answered Apr 20 '12 at 07:45

20

If you can live with the restriction of a power of two number of items to average (ie 2,4,8,16,32 etc) then the divide can easily and efficiently be done on a low performance micro with no dedicated divide because it can be done as a bit shift. Each shift right is one power of two eg:

avg = sum >> 2; //divide by 2^2 (4)

or

avg = sum >> 3; //divide by 2^3 (8)

etc.

answered Apr 20 '12 at 07:45

Martin

8,420
1
23
30

how does that help? The OP says the main problem is keeping around past samples in memory. – Jason S Apr 20 '12 at 12:41
This does not address the OP's question at all. – Rocketmagnet Apr 20 '12 at 12:48
12

The OP thought he had two problems, dividing in a PIC16 and memory for his ring buffer. This answer shows that the dividing is not difficult. Admittedly it does not address the memory problem but the SE system allows partial answers, and users can take something from each answer for themselves, or even edit and combine other's answers. Since some of the other answers require a divide operation, they are similarly incomplete since they do not show how to efficiently achieve this on a PIC16. – Martin Apr 20 '12 at 13:01

score 9 · Answer 3 · edited Apr 13 '17 at 12:47

There's some in-depth analysis of the math behind using the first order IIR filter that Olin Lathrop has already described over on the Digital Signal Processing stack exchange (includes lots of pretty pictures.) The equation for this IIR filter is:

y[n]=αx[n]+(1−α)y[n−1]

This can be implemented using only integers and no division using the following code (might need some debugging as I was typing from memory.)

/**
*  @details    Implement a first order IIR filter to approximate a K sample 
*              moving average.  This function implements the equation:
*
*                  y[n] = alpha * x[n] + (1 - alpha) * y[n-1]
*
*  @param      *filter - a Signed 15.16 fixed-point value.
*  @param      sample - the 16-bit value of the current sample.
*/

#define BITS 2      ///< This is roughly = log2( 1 / alpha )

short IIR_Filter(long *filter, short sample)
{
    long local_sample = sample << 16;

    *filter += (local_sample - *filter) >> BITS;

    return (short)((*filter+0x8000) >> 16);     ///< Round by adding .5 and truncating.
}

This filter approximates a moving average of the last K samples by setting the value of alpha to 1/K. Do this in the preceding code by #defineing BITS to LOG2(K), i.e. for K = 16 set BITS to 4, for K = 4 set BITS to 2, etc.

(I'll verify the code listed here as soon as I get a change and edit this answer if needed.)

Jason S · Answer 4 · 2012-04-20T13:00:15.863

There is an answer for a true moving average filter (aka "boxcar filter") with less memory requirements, if you don't mind downsampling. It's called a cascaded integrator-comb filter (CIC). The idea is that you have an integrator which you take differences of over a time period, and the key memory-saving device is that by downsampling, you don't have to store every value of the integrator. It can be implemented using the following pseudocode:

function out = filterInput(in)
{
   const int decimationFactor = /* 2 or 4 or 8 or whatever */;
   const int statesize = /* whatever */
   static int integrator = 0;
   static int downsample_count = 0;
   static int ringbuffer[statesize];
   // don't forget to initialize the ringbuffer somehow
   static int ringbuffer_ptr = 0;
   static int outstate = 0;

   integrator += in;
   if (++downsample_count >= decimationFactor)
   {
     int oldintegrator = ringbuffer[ringbuffer_ptr];
     ringbuffer[ringbuffer_ptr] = integrator;
     ringbuffer_ptr = (ringbuffer_ptr + 1) % statesize;
     outstate = (integrator - oldintegrator) / (statesize * decimationFactor);
   }
   return outstate;
}

Your effective moving average length is decimationFactor*statesize but you only need to keep around statesize samples. Obviously you can get better performance if your statesize and decimationFactor are powers of 2, so that the division and remainder operators get replaced by shifts and mask-ands.

Postscript: I do agree with Olin that you should always consider simple IIR filters before a moving average filter. If you don't need the frequency-nulls of a boxcar filter, a 1-pole or 2-pole low-pass filter will probably work fine.

On the other hand, if you are filtering for the purposes of decimation (taking a high-sample-rate input and averaging it for use by a low-rate process) then a CIC filter may be just what you're looking for. (especially if you can use statesize=1 and avoid the ringbuffer altogether with just a single previous integrator value)

Patrick · Answer 5 · 2012-05-15T01:29:33.027

Here's a single-pole low-pass filter (moving average, with cutoff frequency = CutoffFrequency). Very simple, very fast, works great, and almost no memory overhead.

Note: All variables have scope beyond the filter function, except the passed in newInput

// One-time calculations (can be pre-calculated at compile-time and loaded with constants)
DecayFactor = exp(-2.0 * PI * CutoffFrequency / SampleRate);
AmplitudeFactor = (1.0 - DecayFactor);

// Filter Loop Function ----- THIS IS IT -----
double Filter(double newInput)
{
   MovingAverage *= DecayFactor;
   MovingAverage += AmplitudeFactor * newInput;

   return (MovingAverage);
}

Note: This is a single stage filter. Multiple stages can be cascaded together to increase the sharpness of the filter. If you use more than one stage, you'll have to adjust DecayFactor (as relates to the Cutoff-Frequency) to compensate.

And obviously all you need is those two lines placed anywhere, they don't need their own function. This filter does have a ramp-up time before the moving average represents that of the input signal. If you need to bypass that ramp-up time, you can just initialize MovingAverage to the first value of newInput instead of 0, and hope the first newInput isn't an outlier.

(CutoffFrequency/SampleRate) has a range of between 0 and 0.5. DecayFactor is a value between 0 and 1, usually close to 1.

Single-precision floats are good enough for most things, I just prefer doubles. If you need to stick with integers, you can convert DecayFactor and Amplitude Factor into fractional integers, in which the numerator is stored as the integer, and the denominator is an integer power of 2 (so you can bit-shift to the right as the denominator rather than having to divide during the filter loop). For example, if DecayFactor = 0.99, and you want to use integers, you can set DecayFactor = 0.99 * 65536 = 64881. And then anytime you multiply by DecayFactor in your filter loop, just shift the result >> 16.

For more information on this, an excellent book that's online, chapter 19 on recursive filters: http://www.dspguide.com/ch19.htm

P.S. For the Moving Average paradigm, a different approach to setting DecayFactor and AmplitudeFactor that may be more relevant to your needs, let's say you want the previous, about 6 items averaged together, doing it discretely, you'd add 6 items and divide by 6, so you can set the AmplitudeFactor to 1/6, and DecayFactor to (1.0 - AmplitudeFactor).

score 4 · Answer 6 · answered Apr 20 '12 at 08:55

You can approximate a moving avarage for some applications with a simple IIR filter.

weight is 0..255 value, high values = shorter timescale for avaraging

Value = (newvalue*weight+value*(256-weight))/256

To avoid rounding errors, value would normally be a long, of which you only use higher -order bytes as your 'actual' value.

Stephen Collings · Answer 7 · 2013-08-29T13:40:36.167

4

Everyone else has commented thoroughly on the utility of IIR vs. FIR, and on power-of-two division. I'd just like to give some implementation details. The below works well on small microcontrollers with no FPU. There's no multiplication, and if you keep N a power of two, all the division is single-cycle bit-shifting.

Basic FIR ring buffer: keep a running buffer of the last N values, and a running SUM of all the values in the buffer. Each time a new sample comes in, subtract the oldest value in the buffer from SUM, replace it with the new sample, add the new sample to SUM, and output SUM/N.

unsigned int Filter(unsigned int sample){
    static unsigned int buffer[N];
    static unsigned char oldest = 0;
    static unsigned long sum;

    sum -= buffer[oldest];
    sum += sample;
    buffer[oldest] = sample;
    oldest += 1;
    if (oldest >= N) oldest = 0;

    return sum/N;
}

Modified IIR ring buffer: keep a running SUM of the last N values. Each time a new sample comes in, SUM -= SUM/N, add in the new sample, and output SUM/N.

unsigned int Filter(unsigned int sample){
    static unsigned long sum;

    sum -= sum/N;
    sum += sample;

    return sum/N;
}

edited Aug 29 '13 at 13:40

answered Aug 28 '13 at 13:45

Stephen Collings

17,563
18
93
187

If I'm reading you right, you're describing a first-order IIR filter; the value you're subtracting isn't the oldest value which is falling out, but is instead the average of the previous values. First-order IIR filters can certainly be useful, but I'm not sure what you mean when you suggest that the output is the same for all periodic signals. At a 10KHz sample rate, feeding a 100Hz square wave into a 20-stage box filter will yield a signal that rises uniformly for 20 samples, sits high for 30, drops uniformly for 20 samples, and sits low for 30. A first-order IIR filter... – supercat Aug 28 '13 at 15:31
...will yield a wave which sharply starts rising and gradually levels off near (but not at) the input maximum, then sharply starts falling and gradually levels off near (but not at) the input minimum. Very different behavior. – supercat Aug 28 '13 at 15:32
You're right, I was confusing two kinds of filter. This is indeed a first-order IIR. I'm changing my answer to match. Thanks. – Stephen Collings Aug 29 '13 at 13:05
One issue is that a simple moving average may or may not be useful. With an IIR filter, you can get a nice filter with relatively few calcs. The FIR you describe can only give you a rectangle in time -- a sinc in freq -- and you can't manage the side lobes. It may be well worth it to throw in a few integer multiplies to make it a nice symmetric tunable FIR if you can spare the clock ticks. – Scott Seidman Aug 29 '13 at 13:50
@ScottSeidman: No need for multiplies if one simply has each stage of the FIR either output the average of the input to that stage and its previous stored value, and then store the input (if one has the numeric range, one could use the sum rather than average). Whether that's better than a box filter depends on the application (the step response of a box filter with a total delay of 1ms, for example, will have a nasty d2/dt spike when the input change, and again 1ms later, but will have the minimum possible d/dt for a filter with a total 1ms delay). – supercat Aug 29 '13 at 15:25
@ScottSeidman: The amount of computation required for a box filter is independent of the length, while the computation for the Gaussian FIR filter I described is proportional to the length. I suspect one could make a Gaussian FIR filter with per-element computation proportional to the log of the length, but I've not worked out the details. – supercat Aug 29 '13 at 15:29
@supercat -- I'm not following. What would an implementation of a simple symmetric FIR filter of the pattern a b b a, all integer, look like? (not to mention, how do you know a delay w/o a sample rate?) – Scott Seidman Aug 29 '13 at 15:34
I get the part about the box filter complexity having nothing to do w/ length – Scott Seidman Aug 29 '13 at 15:36
Feels sorta like treating an N-width FIR as N 1-point FIRs. Is that line of thought moving me in the right direction? – Scott Seidman Aug 29 '13 at 15:39
@ScottSeidman: You got it. I've done such things on a number of platforms; the machine code can sometimes work out very nicely. – supercat Aug 29 '13 at 16:37

Telaclavo · Answer 8 · 2012-04-20T10:05:20.840

2

As mikeselectricstuff said, if you really need to reduce your memory needs, and you don't mind your impulse response being an exponential (instead of a rectangular pulse), I would go for an exponential moving average filter. I use them extensively. With that type of filter, you don't need any buffer. You don't have to store N past samples. Just one. So, your memory requirements get cut down by a factor of N.

Also, you don't need any division for that. Only multiplications. If you have access to floating-point arithmetic, use floating-point multiplications. Otherwise, do integer multiplications and shifts to the right. However, we are in 2012, and I would recommend you to use compilers (and MCUs) that allow you to work with floating-point numbers.

Besides being more memory efficient and faster (you don't have to update items in any circular buffer), I would say it is also more natural, because an exponential impulse response matches better the way nature behaves, in most cases.

edited Apr 20 '12 at 10:05

answered Apr 20 '12 at 09:59

Telaclavo

4,897
19
28

5

I dont agree with you recommendation of using floating point numbers. The OP probably uses a 8-bit microcontroller for a reason. Finding a 8-bit microcontroller with hardware floating-point support could be a difficult task (do you know any?). And using floating-point numbers without hardware support will be a very resource intensive task. – PetPaulsen Apr 20 '12 at 10:22
5

Saying you should always use a process with floating point capability is just silly. Besides, any processor can do floating point, it's just a question of speed. In the embedded world, a few cents in build cost can be meaningful. – Olin Lathrop Apr 20 '12 at 11:58
@Olin Lathrop and PetPaulsen: I never said he should use an MCU with hardware FPU. Re-read my answer. By "(and MCUs)" I mean MCUs powerful enough to work with software floating-point arithmetic in a fluid way, which is not the case for all MCUs. – Telaclavo Apr 20 '12 at 12:05
@PetPaulsen I just guessed you work with PICs :-) (I went to your profile and saw other questions of yours). And yes, I know of many MCUs with hardware FPU, like the ones with Cortex-M4 (http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php?tab=Specifications), which are many, today (NXP, Freescale, Atmel...), but I was NOT telling the OP to use a hardware FPU. – Telaclavo Apr 20 '12 at 12:10
4

No need to use floating-point (hardware OR software) just for a 1-pole low-pass filter. – Jason S Apr 20 '12 at 12:40
1

If he had floating point operations he wouldn't object to division in the first place. – Federico Russo Apr 20 '12 at 13:04
@Frederico Yes, he'd object because everything was slow, not just divisions :) – AngryEE Apr 20 '12 at 13:07
@Telaclavo - You are right, I never worked with ARM MCUs and such, but all(?) the MCUs you listed are 32-bit. Again, the OP is using a 8-bit PIC. Another point: why do you think the OP wants to do software floating-point processing, when he wants to avoid division in the first place ... if the MCU if powerful enough to do this in software, division wouldn't probably a problem neither. – PetPaulsen Apr 20 '12 at 13:07

Chris · Answer 9 · 2014-11-24T05:08:39.827

One issue with the IIR filter as almost touched by @olin and @supercat but apparently disregarded by others is that the rounding down introduces some imprecision (and potentially bias/truncation) : assuming that N is a power of two, and only integer arithmetic is used, the shift right does systematically eliminate the LSBs of the new sample. That means that how long the series could ever be, the average will never take those into account.

For example, suppose a slowly decreasing series (8,8,8,...,8,7,7,7,...7,6,6,) and assume the average is indeed 8 at the beginning. The fist "7" sample will bring the average to 7, whatever the filter strength. Just for one sample. Same story for 6, etc. Now think of the opposite : the serie goes up. THe average will stay on 7 forever, until the sample is big enough to make it change.

Of course, you can correct for the "bias" by adding 1/2^N/2, but that won't really solve the precision problem : in that case the decreasing series will stay forever at 8 until the sample is 8-1/2^(N/2). For N=4 for example, any sample above zero will keep the average unchanged.

I believe a solution for that would imply to hold an accumulator of the lost LSBs. But I didn't make it far enough to have code ready, and I'm not sure it would not harm the IIR power in some other cases of series (for example whether 7,9,7,9 would average to 8 then).

@Olin, your two-stage cascade also would need some explanation. Do you mean holding two average values with the result of the first fed into the second in each iteration ? What's the benefit of this ?

Fast and memory efficient moving average calculation

9 Answers9

Added about numerical precision

Code Examples

Linked

Related