# Solution 4.11 - Chemometrics: Data Analysis for the Laboratory and Chemical Plant

## Education Article

**Published:**Jan 1, 2000**Channels:**Chemometrics & Informatics

1. The scores and loadings are presented below.

2. The first five eigenvalues sum to 60547.17, whereas the sum of squares of the raw data equals 60547.94.

The scores plot of the overall dataset is as follows, each class represented by different symbols.

3. PCA is performed separately on each class, the first five PCs listed below.

The overall sum of squares of class A is 29135.4 and for class B is 31412.6. The first two eigenvalues for class A correspond to 94.1% (=100 ´ (17833.7 + 9573.7) / 29135.4 ) of the variance whereas for class B to 90.1% (=100 ´ (17441.4 + 10858.8) / 31412.6). Adding an extra component to class B increases this to 97.1%. Of course there are other approaches to determining the number of significant components,. which may provide slightly different answers.

4. The reader should understand why this operation works. The sum of squares of the scores as fitted to each class model should equal the observed sum of squares if the class is modelled well.

The scores and sums of squares using the two models are given below.

5. The ratios are as follows.

6. On the whole class B seems much better modelled, as the ratios to the class A model are low (apart from two samples). This is not so true with class A, there are several samples in class A that are also well described by the class B model. This could relate to the class structure. The fifth sample from class B could well be misassigned or an outlier as it has a low ratio for the class B model. The first sample in class A is also possibly suspect.