# Solution 4.1 - Chemometrics: Data Analysis for the Laboratory and Chemical Plant

## Education Article

**Published:**Jan 1, 2000**Channels:**Chemometrics & Informatics

1. The standardised data are given below

Notice that the sample rather than population standard deviation is normally using in standardising data matrices, as the procedure relates to data preprocessing rather than statistical sampling. This procedure is important because the raw data are on completely different scales.

2. The scores, loadings and eigenvalues are as follows.

Note in this and other examples that sometimes the signs of each PC (both scores and loadings) are reversed according to method of calculation. This cannot be avoided, and is a consequence of the sign of a square root being indeterminate.

The sum of the first two eigenvalues is 117.083. Providing the data has been standardised as above (using the population standard deviation), the overall sum of squares of the data prior to PCA equals the number of measurements or 27 ´ 5 (=135). Therefore the first two PCs represent 100 ´ 117.083 / 135 % or 86.73 % of the variability. Notice that the sum of squares of each column equals the number of objects (27), a consequence of standardisation.

3. The following is the scores plot for PC2 versus PC1.

Groups 3 and 4 are especially well separated and group 1 is quite distinct. Group 3 has very high scores along PC2, and group 4 low scores. The other groups show some separation, primarily along the first PC, but the groupings would not be obvious on first inspection.

4. The loadings are as follows.

Melting point and boiling point cluster very closely together, electronegativity has a high loading on PC2 and appears to exhibit very different behaviour to the other three properties.

5. The correlation matrix is presented below.

The higher the correlation, the closer the variables are in the PCA plot. Notice that Melting point and Boiling point have very high correlations, but electronegativity has low correlations with all the other physical properties. Notice that density has the highest and oxidation number the lowest correlation with melting point, which is reflected by the distance in the PC plot. The high correspondence between correlation coefficients and distances is a consequence of their being only two significant components.

6. The scores plot is as follows.

Because electronegativity mainly influences position of halides (group 3), it is now not possible to distinguish these from the Noble gases (group 4). However, the other groups are now much more clearly distinguished.