# Solution 2.12 - Chemometrics: Data Analysis for the Laboratory and Chemical Plant

## Education Article

**Published:**Jan 1, 2000**Channels:**Chemometrics & Informatics

In the answer to this problem all matrices have been transposed for convenience.

1. The standardised data matrix is given below.

It is important to standardise the data because the chromatographic tests are on completely different scales. If the data is unstandardised the N and N(df) tests will dominate the analysis.

2. The data are already standardised, and so mean centred. Hence only mean centred PCA has relevance. Other common methods for data preprocessing such as scaling the rows to a constant total cannot be applied to data that has already been standardised. The loadings for the first three principal components are tabulated below.

Note that the sign of the loadings may differ according to implementation of PCA and has no physical significance.

3. The maximum and minimum values of the loadings are as follows

PC1 |
PC2 |
PC3 | |

max |
0.215 |
0.268 |
0.413 |

min |
-0.252 |
-0.309 |
-0.319 |

making the scaled loadings matrix as follows

(Continue to next page for part 2 of solution 2.12)

4. The Euclidean distance matrix is given below. The columns correspond to the 9 design points of the question.

5. The tests closest to the nine design points are as follows.

1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |

AN(df) |
PN(df) |
Pk |
QAs |
AN |
CAs |
PAs |
AAs |
BAs |

This would allow the original 32 tests to be reduced to 9 tests. It is possible to check that these are representative by performing PCA on the reduced dataset of 9 tests and seeing how closely the scores plot resembles that obtaining using all 32 tests. Further improvements could be made by observing that compounds P and A are represented twice. However, Q, C and B only once, meaning the original 8 compounds have been reduced to 5. The CN parameter is also fairly close to the centre, can this be used instead of BAs, so reducing the number of test compounds to 4. QAs is uniquely close to design point 4, so probably cannot be replaced as it provides unique information about the separation.