# Solution 5.9 - Chemometrics: Data Analysis for the Laboratory and Chemical Plant

## Education Article

**Published:**Jan 1, 2000**Channels:**Chemometrics & Informatics

1. Here is the new "x" block, together with the relevant concentrations.

2. It is useful to standardise the measurements to give each wavelength equal weight. For example, the sixth and seventh wavelengths are much more intense than the first and two last ones, and they would otherwise dominate the analysis.

The scores for the first three components are as follows.

3. The graphs are as follows.

The root mean square errors are 0.0131, 0.0036 and 0.0012 mM for increasing number of components.

4. The following graph is obtained, suggesting 3 components are optimal.

5. Because it is rather large we will not represent this matrix in the answers.

6. The problem is that some variables will now completely represent noise and so degrade the data analysis.

7. The standard deviation for the first 100 variables varies from 0.03021 to 0.005499 AU, using the population sd. Note that the aim is simply to sort in order so the variance or sample sd is perfectly legitimate in this case.

8. The autopredictive errors are 0.01778, 0.01829, and 0.00204 mM for increasing number of PLS components. Notice that it actually increases when 2 components are calculated. This is because the root mean square error is divided by 7 (=10-2-1) rather than 8 and hardly changes as the second PLS component is computed.

9. There are many ways, for example, further variable selection, or weighted regression.