4

First off, I'm not a mathematician, and never took statistics in college, what I know about standard deviation I've learned in the past few weeks, so be gentle...

I'm working on a piece of software that calculates the standard deviation of the concentration of oxygen in a sample of water over time. I'm using a rolling numerical array of 600 doubles taken at 1 second intervals (10 minutes of data). By rolling array I mean that over time as the array fills, I increment a counter until it hits the limit of the array (600 elements), and then I reset the counter to zero to begin overwriting previous elements in the array. In this manner, as the percent oxygen settles out, the standard deviation drops over time. When the σ gets to the required level, the oxygen sensor takes a reading on the concentration of oxygen and then moves on to the next gas point.

My question is: am I taking too many points? It can take a very long time for the σ to drop to the level requested for the experiment before taking a calibration reading at that gas point. Because it takes so long, I'm actually allowing the software to move to the next point when the σ is an order of magnitude greater than requested (σ = 0.02 vs σ = 0.002).

If I reduce the number of elements in the array, I think the calculation would move much faster because there are simply fewer elements to calculate and the resultant dataset is smaller to manipulate. The software doesn't care what size the array is, it just calculates whatever size array is passed to it. Would reducing the number of elements in the array reduce the accuracy of the calculation significantly? Basically trading accuracy for speed?

For the piece of software I'm using to calculate standard deviation, you can find it here: http://www.devx.com/vb2themax/Tip/19007. It's been slightly modified from this example, but not very much:

Function ArrayStdDev(arr As Variant, Optional SampleStdDev As Boolean = True, _
Optional IgnoreEmpty As Boolean = True) As Double
Dim sum As Double
Dim sumSquare As Double
Dim value As Double
Dim count As Long
Dim Index As Long

' evaluate sum of values
' if arr isn't an array, the following statement raises an error
For Index = LBound(arr) To UBound(arr)
    value = arr(Index)
    ' skip over non-numeric values
    If value <> 0 Then
        ' add to the running total
        count = count + 1
        sum = sum + value
        sumSquare = sumSquare + value * value
     End If
Next

' evaluate the result
' use (Count-1) if evaluating the standard deviation of a sample
If count < 2 Then
ArrayStdDev = -9.99999

ElseIf SampleStdDev Then
    ArrayStdDev = Sqr((sumSquare - (sum * sum / count)) / (count - 1))
Else
    ArrayStdDev = Sqr((sumSquare - (sum * sum / count)) / count)
End If

End Function

I hope I've asked an answerable question and appreciate any insight offered.

delliottg
  • 151
  • You want the standard deviation to be small to weed out transient effects, right? If so, it seems like you could sample until the previous minute's samples have a low sigma, as opposed to 10 minutes. If stability is achieved in the 5th minute, your algorithm will run for at least 15 minutes. – Matthew Leingang Aug 19 '15 at 18:14
  • 1
    "as the percent oxygen settles out...": this implies that you have some sort of model of the process in mind. The nature of this model is crucial, so you might like to tell us something about it. – TonyK Aug 19 '15 at 18:15
  • This is correct, we already have a process in place for doing oxygen calibration of our sensors, but we're working on a new process that's basically the exact opposite of what we're currently doing (old method run O2 to high percentage, sparge N2 to next gas point, new method, sparge N2 to low percentage O2, then sparge different percent O2 through various gas points). The sparging of O2 to the next gas point is the calculation I'm trying to tweak. It works now, but going between points is taking more than an hour, and there are 30 points. Did that help any? – delliottg Aug 19 '15 at 18:23
  • @MatthewLeingang, can you explain the term "low sigma"? If you mean that the variances from the mean are small, then yes that's what we want. By the time we've run oxygen for a while, the chances of a transient are low unless a bubble gets into the measurement chamber (not likely, but not impossible). We're looking for a steady state for the gas percentage before we turn on the high accuracy sensor to take a reading (we have a low accuracy sensor running continuously, but take "low" with a grain of salt, it's still incredibly accurate, but the other is fantastically accurate). – delliottg Aug 19 '15 at 18:27
  • @delliottg I am stabbing in the dark here, but what I mean is this: Your algorithm seems to run until the $\sigma$ of the last 600 data points is $<0.002$. Would it be sufficient to proceed once the $\sigma$ of the last 60 data points is $<0.002$? Because if so, your algorithm is running 9 minutes longer than it needs to each time. – Matthew Leingang Aug 19 '15 at 18:38
  • Ah, I see, that makes sense. The σ is being calculated once per second as well and being displayed in the UI as well as being used in the wait routine in near real time. So reducing the number of elements in the array from 600 to 60 should still give me valid data, just in less time? Did I understand that correctly? – delliottg Aug 19 '15 at 18:46

1 Answers1

1

Based on the various suggestions in the comments above, I reduced the number of elements in the array from 600 to 200. Empirical testing showed me that anything below 200 and the calculation spanned too short of a sample and the accurate sensor would be triggered too soon (before the ml/L of oxygen measured σ had settled enough for it to be an accurate measurement). This cascaded into modifying a number of other parameters for the water bath to accommodate the changes. This wasn't unexpected, we are after all, in the experimental stages of this new type of calibration. Thanks to the suggestions provided by you guys we're now in the tweaking part of the experiment, instead of the "why isn't this working right part".

So the answer to my question seems to be "200". I was hoping for 42...

delliottg
  • 151