Solution 2.2 - Chemometrics: Data Analysis for the Laboratory and Chemical Plant
Education Article
- Published: Jan 1, 2000
- Channels: Chemometrics & Informatics
1. These values are given by
x_{0} |
x_{1} |
x_{2} |
x_{3} |
x_{4} |
x_{5} |
x_{1}x_{2} |
x_{1}x_{3} |
x_{1}x_{4} |
x_{1}x_{5} |
x_{2}x_{3} |
x_{2}x_{4} |
x_{2}x_{5} |
x_{3}x_{4} |
x_{3}x_{5} |
x_{4}x_{5} |
1 |
-1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
-1 |
1 |
1 |
-1 |
1 |
-1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
-1 |
-1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
-1 |
1 |
-1 |
1 |
1 |
1 |
-1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
-1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
-1 |
-1 |
-1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
1 |
-1 |
1 |
-1 |
-1 |
-1 |
1 |
-1 |
-1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
1 |
-1 |
1 |
1 |
-1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
-1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
2. The eight factors
x_{0}, x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{1}x_{3} and x_{1}x_{4} are all different.
Confounding is as follows
1 = 25 2 = 15 3=45 4=35 5=1 2=34 13 = 24 and 14 = 23
using the notation of Table 2.24 in the printed text. This can easily be seen by checking the columns in the answer to question 1, for example, the column for x_{1} is identical to that for x_{2}x_{5} hence the relationship 1 = 25 and so on.
3. The design matrix, D, is given by
x_{0} |
x_{1} |
x_{2} |
x_{3} |
x_{4} |
x_{5} |
1 |
-1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
-1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
-1 |
1 |
1 |
1 |
-1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
1 |
-1 |
1 |
-1 |
-1 |
1 |
-1 |
1 |
1 |
-1 |
-1 |
1 |
1 |
1 |
1 |
1 |
1 |
4. Hence
b = (D'.D)^{-1} .D' .y
giving an equation of
y = 90.5 + 18.75x_{1} + 27.75x_{2} + 5.0x_{3} – 26.0 x_{4} + 31.0 x_{5}
The significance can be assessed simply by the size of the coefficients, since they vary over identical ranges in the coded data. Note that the overall values of NO vary between 26 and 176 mg MJ^{-1} or 150 mg MJ^{-1}. Hence, for example, the first factor (the load) on average accounts for 2´ 18.75/150 (=2´ coefficient/range) of the variability between the highest and lowest levels or 25% of the variability (the factor of 2 is because difference between the coded levels is 2 and not 1).
It would appear the NH_{3} has little practical effect, the air / fuel ratio some influence, the other three factors being all of approximately similar and fairly high significance.
The t-test could also be used, but because errors are unlikely to be normal, the main aim is to give the experimenter guidance as to which factors are important to control, and whether the influence is positive or negative.
5. The calculation is presented below
True response |
Predicted response |
Residual |
109 |
96 |
13 |
26 |
19.5 |
6.5 |
31 |
37.5 |
-6.5 |
176 |
189 |
-13 |
41 |
54 |
-13 |
75 |
81.5 |
-6.5 |
106 |
99.5 |
6.5 |
160 |
147 |
13 |
Sum of squares of residuals |
845 | |
Root mean sum of square |
20.56 | |
Average of raw data |
90.5 | |
Percentage root mean square error |
22.71 |
Note that there are only 2 degrees of freedom to determine the root mean square residual error, hence the value of 20.56 not 10.28 which would be obtained if dividing by 8 rather than 2. The percentage error is 22.71%. Although it is possible to interpret this in a more detailed statistical manner, the nature of the data probably precludes this. It is likely to be sufficient to inform the experimenter that the predictions are accurate to within 20%. A more detailed model would probably require a different experimental strategy.