6

In textbooks and scientific litterature, analytical imprecision is often assumed to be normally distributed. I.e., theoretically, if the same chemical sample were analysed over and over again ad infinitum, the measurements would follow a normal distribution with a mean value equal to the true analyte concentration and a standard deviation corresponding to the coefficient of variation (CV) of the analysis.

However, according to some papers such as this one and this presentation by the same author...

[...] the distribution of values attributed to the measurand is sometimes approximately lognormal and therefore asymmetric around the measurement value.

In another paper...

[i]t is argued that (a) there is no theoretical reason why such distributions should be log-normal and there is abundant evidence that they are not; (b) quasi-log-normal distributions can be produced as artifacts by data recording practices; and (c) inordinately large numbers of analytical results would be needed to distinguish a log-normal distribution from a normal distribution.

This may be important because if measurements are normally distributed, the true concentration in a sample is best estimated by making several measurements and calculating their mean. For log-normally distributed measurements, however, the true value is more accurately estimated by the geometric mean, whereas the arithmetic mean will yield a positive bias.

When the CV is low, the normal and log-normal distribution will be very similar in shape. When the CV increases, the difference will be more pronounced. In addition to the difference in shape, the normal distribution opens up to the possibility that measurements will sometimes have negative values, whereas the log-normal will always yield positive values.

Are there good reasons to believe that measurements are normally rather than log-normally distributed? If measurements can indeed end up having negative values, how does that happen analytically?

Addition: I am particularly interested in how the above question pertains to UHPLC-MS/MS within the field of analytical pharmacology.

Peder Holman
  • 163
  • 4
  • 6
    I'm not at all sure that a general answer is possible. The answer depends on what is being measured and how the measurement is done. Eg precise measurement of a long length will look normal even though it isn't (it can't be negative). Some measurements (like decay counts in radioactivity) are nothing like normal distributions. The experimentalist has to think carefully about the detail to be sure what assumption to apply. – matt_black Oct 07 '22 at 13:19
  • 1
    It may depend on the relative distance of confidential interval and limit of quantification. Closiness may effectively make both sides asymmetrical. The test of distribution symmetry may offer the decision factor. (If there are not theoretical reasons for other distributions.) – Poutnik Oct 07 '22 at 13:19
  • 1
    I have encountered log normal distributions when instrument errors are low and sample variation is high but is bound at one end by detection limits. The variation is only on the high side of the measurements. – jimchmst Oct 07 '22 at 14:58
  • 1
    As per Lippmann from the book "The Calculus of Observations", "Everybody believes in the exponential law of errors: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation" Source https://mathworld.wolfram.com/NormalDistribution.html – AChem Oct 07 '22 at 15:57
  • 1
    I think you have already answered your own query "(c) inordinately large numbers of analytical results would be needed to distinguish a log-normal distribution from a normal distribution." – AChem Oct 07 '22 at 16:21
  • 3
    The correct answer in most cases is likely neither normal nor log normal but rather something more complicated resulting from multiple effects including but not limited to nonlinearity of method, limits of detection, absolute bounds, and decreased noise at higher measured values – Andrew Oct 07 '22 at 16:46
  • While this is relevant to chemistry, this question applies more generally to all branches of physical science that employ instrumentation, so you may want to ask this at the stat SE site: https://stats.stackexchange.com/ – Buck Thorn Oct 07 '22 at 16:59
  • 2
    Sometimes, the same measurement is reported two different way, e.g. transmission and absorption in UVvis spectroscopy. Depending which one you use, the distribution will be different for the displayed value. – Karsten Oct 07 '22 at 22:08
  • 2
    The comparison between absorbance transmission and can be difficult because the machine measures light intensity and calculates absorbance. Every FTir I have used has difficulty in accurately measuring 0%T; Almost all will record negative transmissions for moderately thick samples that slowly approach 0% for very thick samples or for complete beam blockage. I think this is because moderately thick samples reemit absorbed radiation that is absorbed as the samples become larger and reabsorb the light. This is an addition error above the instrument noise. – jimchmst Oct 08 '22 at 22:29
  • 2
    Ran out of space. This means that for low transmitting frequencies the zero is artificially high giving a falsely high absorbance that unfortunately is given a false maximum absorbance by the software if the transmission as measured is negative. The point is that instrument performance must be understood in how the instrument errors affect variation in the sample measurements. This sounds trivial but I have encountered several publications quoting these assigned absorbances as actual absorbance readings – jimchmst Oct 08 '22 at 22:30
  • Thank you all for your contributions. It would seems that there is no simple answer to my question. Perhaps I can narrow down the problem by adding that I am particularly interested in UHPLC-MS/MS within the field of analytical pharmacology. – Peder Holman Oct 11 '22 at 07:47
  • 1
    There is surely an answer, but perhaps there is too much to chew on in this question. For instance, start with the concept "true concentration". See eg https://pubs.acs.org/doi/pdf/10.1021/ac971793j – Buck Thorn Oct 11 '22 at 10:30
  • 1
    More recently: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6999182/ – Buck Thorn Oct 11 '22 at 10:34
  • The terminology in this debate is confusing. Things can be normally distributed or lognormally distributed; but also, error can be multiplicative or additive. These are different concepts. If $y_{measured} = (1+\epsilon_m)y_{true} + \epsilon_a$, where $\epsilon_m$ is multiplicative error, and $\epsilon_a$ is additive error, we still haven't said whether these $\epsilon$ values are both normally distributed, both lognormally distributed, or neither. I suspect that most of these debates are really about whether $\epsilon_m \ll \epsilon_a$ or vice-versa, not about log-normality. – Curt F. Nov 17 '22 at 02:57

2 Answers2

6

As the long series of comments illustrates, there is no universal answer, it is highly situation specific. However it seems that you got concerned by reading Ramsey et al. papers and presentations. It is true that his observations are correct, but look at his samples: lead in contaminated soils. Why should we expect it to be normally distributed? A contaminated piece of land will indeed show a skew because the contaminated area will have more lead near the contamination site, especially if it is coming from a very heterogeneous waste (landfill). For example, if someone discards a piece of lead in the trash, the local area will have more lead than the surrounding area. The whole argument of log-normal is about heterogeneous samples and long-range heterogeneity. If we were analyzing sodium in the surface seawater, chances of skew are nil and normal distribution is expected.

Now that the focus of the question is on pharmaceutical analysis and UHPLC MS/MS type work, for all practical purposes the normal distribution is used by all. This is because the analysis pertains to precisely controlled compositions and sample preparation processes of active pharmaceutical ingredients (APIs). We should be worried if a certain XYZ company prepares tablets in a skewed way. Tablets prepared in the morning always have a different composition than those prepared in the evening. There will be a red-flag about this skew.

When examining solids by Raman spectroscopy, we might observe skewness in analytical results for tablets. The laser spot size would be a few microns, and the laser would be rastered on the entire tablet. In any case, if the API solids are not homogeneous mixed or are intentionally segregated, then the analytical results will be skewed. For liquid samples, we should not expect skewness.

Some other points:

Are there good reasons to believe that measurements are normally rather than log-normally distributed? If measurements can indeed end up having negative values, how does that happen analytically?

Yes for all practical purposes in UHPLC MS/MS. Negative concentration has no physical significance. Such values have to be discarded or the analytical method re-checked! By definition, concentration must be positive. In other words, the limit of detection of any analyte should not be centered at zero. The "blank" readings may be distributed near zero. In short, the three selected papers represent a "skew" in the analysis because you have discarded hundreds of other paper which believe in normal distribution.

As per Lippmann quotes in his book "The Calculus of Observations", "Everybody believes in the exponential law of errors: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation" Source mathworld.wolfram.com/NormalDistribution.html

BTW, there is an expert in SE who has authored an entire book on the Limit of Detection. I hope he can also add his valuable insight.

AChem
  • 40,037
  • 2
  • 63
  • 126
-1

There is one overwhelming reason why errors can never be normally distributed which is so obvious that I hesitate to mention it.

Suppose (to fix our ideas) that we analyse a sample and the analysis says that the sample contains 1ppm of lead.

Then normally-distributed error would imply that there is a non-zero probability that the real concentration of lead in the sample is less than 0ppm. Which is impossible.

  • 1
    You have to explore the concept of limit of detection in analytical chemistry, which addresses your exact concern. There is a nice book "Limits of Detection in Chemical Analysis" by Edward Voigtman "https://www.wiley.com/en-us/Limits+of+Detection+in+Chemical+Analysis-p-9781119188971" – AChem Oct 12 '22 at 13:06
  • 1
    Here, I think it's relevant to distinguish between imprecision and uncertainty. While uncertainty is definitely related to imprecision (and also bias), it is not synonymous with it. I completely agree that any distribution describing the uncertainty of the measurement should not contain negative values because the true value must be positive. However, it is conceivable that a measurement can be negative despite the true value being positive, for example if the random error/imprecision is normally distributed. – Peder Holman Oct 13 '22 at 16:20