# Solution 2.7 - Chemometrics: Data Analysis for the Laboratory and Chemical Plant

## Education Article

• Published: Jan 1, 2000
• Channels: Chemometrics & Informatics

1. The design matrix is given below

2. 10 degrees of freedom are required for the model. 5 degrees of freedom are available for replication, so there are 5 degrees of freedom (=20-10-5) for testing the lack-of-fit.

3. The coefficients for the model are as follows.

 b0 b1 b2 b3 b11 b22 b33 b12 b13 b23 5013.51 143.39 -28.11 42.40 -71.73 -129.51 -63.08 -71.75 14.25 -22.00

4. The calculation is presented below. Note that there are two possible answers for the root mean square error. The statistically correct is to divide the residual sum of squares by 10 (=N-P) but in other circumstances this number is divided by 20 (=N). For any formal statistical test, the former answer should be used, but sometimes the latter measure can be employed. In other areas of chemometrics it is not always so easy to determine how many degrees of freedom have been lost due to modelling and data preprocessing.

It is better to use the standard deviation because the range is small relative to the mean. Note that the sample rather than population standard deviation is employed in this calculation, and that the number of degrees of freedom for the error is 10.

5. The sum of square replicate error is calculated as follows

 True reading Average of replicates Average-true 5063.00 5013.83 -49.17 4968.00 5013.83 45.83 5035.00 5013.83 -21.17 5122.00 5013.83 -108.17 4970.00 5013.83 43.83 4925.00 5013.83 88.83 Sum of squares 26478.83

The sum of squares accounted for by the lack-of-fit is given by 109741.00-26478.83 = 83262.16.

An ANOVA table is given below. Note that it is possible also to include the total error sum of squares for the entire dataset as well as the error sum of squares for the residuals.

 Source of variation Sum of squares Degrees of freedom Mean sum of squares Variance ratio Residual 109741.00 10 10974.1 Replicate 26478.83 5 5295.767 Lack-of-fit 83262.16 5 16652.43 3.14448

Although the lack-of-fit is higher than the replicate error, this is not particularly significant. Despite a relatively high percent error calculated in question 4, because there is quite a high spread of replicates, there is no real evidence that the lack-of-fit is particular large compared to the experimental error.

6. The matrix (D'.D)-1 is given below.

Multiplying the diagonal elements by 10974.1 gives the following variances.

 b0 b1 b2 b3 b11 b22 b33 b12 b13 b23 1827.57 796.38 796.38 796.38 737.30 737.30 737.30 1371.76 1371.76 1371.76

7. The t-statistic is simply obtained from dividing the numbers in question 3 by the square root of their variances to give the following.

 b0 b1 b2 b3 b11 b22 b33 b12 b13 b23 117.27 5.08 -1.00 1.50 -2.64 -4.77 -2.32 -1.94 0.38 -0.59

The most significant are b0 b1 b11 b22 b33 b12 although others could be included. Often a cut-off t-statistic is used. Note that a number of other criteria and methods could be employed to reduce the number of significant terms.

8. A new model with the most significant terms is of the form

= b0 + b1x1 + b11x21 + b22x22 + b33x23 + b12x1x2

Recalculating the model using only the six terms above gives

= 5013.51 + 143.389x1 – 71.73x21 – 129.51x22 –63.08x23 –71.75x1x2

The residual sum of squares increases to 150898.44 which is slightly under 50% increase over the full model. This is probably not very significant, since the residual sum of squares is still very small relative to the overall sum of squares, and represents an overall increase of the root mean square error using 10 degrees of freedom from 50.94% to 59.73%.

9. Equations can be set up as follows.

From the third equation

From the second equation

Hence, substituting into the first equation

The coded and real values of the three coefficients at the optimum are as follows.

 factor coded value real value enzyme (mg protein) 1.16 14.6 arginine (pmoles) -0.32 1135.7 pH 0 7.5

## Microsites

Suppliers Selection
Societies Selection