1

I would like to compare two absorption spectra (or interferograms) and conclude whether between these two there are statistically significant differences at particular wavelength intervals. At the moment, I have data of two experiments that look like this:

    # A tibble: 6 x 5
      t     x1     y1     x2     y2
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 3999. 0.0124 0.0132 0.0122 0.0113
2 3998. 0.0125 0.0130 0.0122 0.0116
3 3997. 0.0122 0.0131 0.0122 0.0113
4 3996. 0.0121 0.0136 0.0122 0.0114
5 3995. 0.0124 0.0139 0.0122 0.0122
6 3994. 0.0125 0.0141 0.0122 0.0129

The first column represents the wavenumber, the x columns represent the absorbance of sample and the y columns represent the absorbance of irradiated sample (before and after).

I was wondering whether I could compare these data (x and y) as time series and if so, what could be the method to quantify the differences, if any, between the samples before and after irradiation. Maybe it's already been done and there is somewhere some information as to how to compare the spectra if the wavenumber is interpreted as time ($x$ axis).

I did the t-test in R and in both experiments the null hypothesis could not be rejected, although for the second experiment (x2, y2) the $p$ value was much lower than for the first. If I average the x and y, and then plot both data, I see that there are visible differences at certain wavelength intervals. But how can I check for sure the differences between spectrums?

Here is a project with similar experiments by Zezell et al. [1]. For statistical analysis they use ANOVA and Tukey's test, but how do I do it for the vectored data?

Reference

  1. Zezell, D. M.; Benetti, C.; Veloso, M. N.; Castro, P. A. A.; Ana, P. A. FTIR Spectroscopy Revealing the Effects of Laser and Ionizing Radiation on Biological Hard Tissues. J. Braz. Chem. Soc 2015. DOI: 10.5935/0103-5053.20150246.
andselisk
  • 37,604
  • 14
  • 131
  • 217
  • 1
    Currently, your question reads like this: 1) the wavelengths are assumed to be correct (checked e.g. against a reference), 2) with sample #1, you recorded $x_1$ in function of the wavelength, 3) irradiated the sample, 4) then recorded $y_1$. You then repeated 2 to 4 for sample #2. Do you have multiple recordings of $x_1$ and $y_1$ to estimate their individual (and wavelength dependent) standard deviation? Because then you test per wavelength if / where the recording were significantly different (per wavelength) with a significance level of x% for sample #1. – Buttonwood Aug 12 '20 at 10:00
  • Equally, especially if you applied a base line correction, check the variation of the transmission / absorbance data while processing the raw data (the interferogram you should keep) for one and the same recording, processed multiple times, too. If you do it visually by mouse click «from shoulder to shoulder» of the absorption bands displayed, variations may be large. – Buttonwood Aug 12 '20 at 10:12
  • @Buttonwood 1) To be precise, the first column stands for wavenumber. #2 At the moment I don't have multiple recordings of the same sample. But if I have data as it is now, what can I possibly do to gauge the differences between spectra and whether they are significant? Can I do it in R? – Gianni D'Adova Aug 12 '20 at 10:53
  • @Buttonwood I just don't know where to look at online, there must have been some studies where the sample is studied for irradiation. What could I do, please help! – Gianni D'Adova Aug 12 '20 at 11:17

1 Answers1

1

The table below illustrates a possible screen, values computed displayed are rounded to four decimals. The suggest is to use the observations prior the intended irradiation ($x_1, x_2$) separate from the observations after the intended irradiation ($y_1, y_2$). Per wavelength

  • compute for $x$ and $y$ the arithmetical mean value
  • determine the standard deviation of this sample (e.g., $x_1$ and $x_2$), the one pocket calculators sometimes label with $\sigma(n-1)$. The table doesn't include it, but you may compute the confidence interval with a reasonable $t$ value. Because there were only two realizations, the degree of freedoms ($f = n - 1$) equates to 1, and an understandably high $t_{1, 0.95} = 12.71$ to map the interval of

$$\bar{y} - \frac{t \cdot \sigma}{\sqrt{n}} \le \bar{y} \le \bar{y} + \frac{t \cdot \sigma}{\sqrt{n}}$$

  • subtract the arithmetical mean values from each other to determine the effect of the irradiation. Expect positive as well as negative values to occur.
  • compute the standard deviation for this effect. Because the effect is computed as a difference of $(\bar{x} - \bar{y})$, the error propagation of is a sum of the corresponding standard deviations. Or add the corresponding halfes of the confidence intervals about $x$ and $y$ instead.

I have no hands-on experience with R.


| lambda |     x1 |     y1 |     x2 |     y2 | mean_x | stdev_x | mean_y | stdev_y | mean_x - mean_y | stdev_x + stdev_y |
|--------+--------+--------+--------+--------+--------+---------+--------+---------+-----------------+-------------------|
|   3999 | 0.0124 | 0.0132 | 0.0122 | 0.0113 | 0.0123 |  0.0001 | 0.0123 |  0.0013 |          0.0000 |            0.0015 |
|   3998 | 0.0125 | 0.0130 | 0.0122 | 0.0116 | 0.0123 |  0.0002 | 0.0123 |  0.0010 |          0.0001 |            0.0012 |
|   3997 | 0.0122 | 0.0131 | 0.0122 | 0.0113 | 0.0122 |  0.0000 | 0.0122 |  0.0013 |          0.0000 |            0.0013 |
|   3996 | 0.0121 | 0.0136 | 0.0122 | 0.0114 | 0.0122 |  0.0001 | 0.0125 |  0.0016 |         -0.0003 |            0.0016 |
|   3995 | 0.0124 | 0.0139 | 0.0122 | 0.0122 | 0.0123 |  0.0001 | 0.0130 |  0.0012 |         -0.0007 |            0.0013 |
|   3994 | 0.0125 | 0.0141 | 0.0122 | 0.0129 | 0.0123 |  0.0002 | 0.0135 |  0.0008 |         -0.0011 |            0.0011 |
Buttonwood
  • 29,590
  • 2
  • 45
  • 108
  • Hello again. Thank you for the input, but it turns out that the given x and y correspond to averages per each point before and after irradiation. In this case, I don't know which method would be usable. The relative error is 10%. I tried to plot confidence intervals ( plotted averaged data and corresponding limits per each lambda) and there where some intervals where the CI didn't overlap. But that's not the way to tell that the data is significantly different isn't it? – Gianni D'Adova Aug 17 '20 at 18:04
  • Yes, the safe approach is to see where the two spectra differ by $3\sigma$ or more (adding the two individual standard deviations) as strong evidence for a significant effect at the significance level set. To assume a uniform, wavelength independent relative error of 10% of the readings obviously looses information compared to working with the non-averaged data. Without any statistics, it is the bare difference plot only, suggested by @MFarooq (https://chemistry.stackexchange.com/questions/138785/comparison-of-two-spectra-in-order-to-find-whether-the-irradiated-sample-has-sig/138858#138858) – Buttonwood Aug 17 '20 at 22:18
  • @Buttonwood If I manage to fit the region of the spectra where the differences are seemingly different with Gaussian line, can I then compare these two lines as coming from these different distributions or am I confusing something? – user Aug 18 '20 at 03:34
  • @user I doubt that we are on the same page here because fitting (like in a regression) was not part of my line of thought. Even considering your question appears pretty close to the one here, I do not understand yet the purpose of the fitting you intend. – Buttonwood Aug 18 '20 at 10:34
  • @GianniD'Adova This now sounds more like IR mapping. Beside clever engineering, indeed many spectra per spot are recorded to gain a S/N ratio >> 3 (e.g., https://www.jstage.jst.go.jp/article/sccj/50/3/50_209/_article, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732436/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236568/) where the binning of the recorded sample matrix may be an issue. A complementary check of the response surfaces of derivative spectra (https://www.whoi.edu/cms/files/derivative_spectroscopy_59633940_175744.pdf as used for UV-Vis spectra) will be a helpful tool. – Buttonwood Aug 18 '20 at 12:06
  • @Buttonwood I read the pdf file. But there is no method I can find. There must be lots of people trying to measure the differences and the significance between differences of the spectra. If I would have the raw data then maybe ANOVA per each wavenumber could be done, but nobody would do that for each wave number, would they? – Gianni D'Adova Aug 18 '20 at 13:07
  • @GianniD'Adova It would require some programation (a faculty I assume R users have). With a $t$ test, once the zero hypothesis of spectra are different along $\bar{x}$ or $\sigma$ and the critical thresholds are defined, it were up to the program repeat this testing in a for-loop over all the wavelengths recorded. The output were a categorical yes/no binary, either permitting, or rejecting the zero hypothesis at a given wavelength, which may be color encoded. – Buttonwood Aug 18 '20 at 18:54
  • @Buttonwood Can you explain what do you mean by "zero hypothesis are different along $\bar{x}$ or $\sigma$"? – Gianni D'Adova Aug 18 '20 at 19:30
  • @GianniD'Adova Applied to the spectra problem, at a fixed wavelength, you may have five recordings prior the sample irradiation, and five recordings after the sample irradiation. You may then test if 1) they differ significantly by the arithmetical mean value (e.g., %transmission), which I denoted with $\bar{x}$. Sometimes it is denoted by $\mu$. 2) You may check if the standard deviations (sometimes called dispersion, https://www.texasgateway.org/resource/101-two-population-means-unknown-standard-deviations) of these two populations differ significantly, or not; which I denoted by $\sigma$. – Buttonwood Aug 24 '20 at 14:30
  • @Buttonwood Ok, thank you. But if, for example, I have multiple recordings of the same sample, what can I do, to analyse those in order to then do comparisons and what not to other groups? Apart from calculating mean of the sample. I have then sample mean, the averaged spectra, which I assume would be population mean, and raw data of few recordings of the same sample. – Gianni D'Adova Aug 26 '20 at 09:25