A paper mill monitors the quality of newsprint by applying ink to one side of the paper. By measuring the reflectance of light on the reverse side of the paper, a reliable, practical measure of how visible the ink is on the opposite side is obtained. This property, PRINTHRU, is an important quality parameter. The paper is also analyzed with regard to several other production parameters and raw material characteristics. The paper mill wants to make a model which can be used for quality control and production management. For example, it may be possible to rationalize the quality control process by reducing the number of parameters measured. Quality should also be predicted for new paper compositions.
The raw data set consists of 118 paper samples collected from the production line over a considerable time interval, in the hope that the measurements would span the important variations in production . For each of the samples 15 X-variables have been measured, (like raw material composition, oil permeability, light reflectance, density, surface parameters, etc.) and one Y-variable, PRINTHRU.
Fig 2. Explained Y-variance as a function of the number of Principal Components
A model Y=f(X) was made by PLS. 85% of the variations of PRINTHRU can be described by only 1 principal component. 3 principal components describe 89%(ie. the problem is reduced for 15 variable dimensions to 3).
Fig 3. The Loading plot shows variable correlations in the two first principal components.
The loading plot shows the variable relationships: along the first principal component (X-axis in the plot, which describes 85% of Y) WEIGHT/sq.m, light SCATTER, OPACity, and the amount of FILLER covary most (in a negative way) with PRINTHRU. This makes sense: opacity is by definition the opposite of PRINTHRU,Filler is added to counteract PRINTHRU, light scatter is reflected light, and a high weight/sq. m. makes the paper thicker and thus less transparent. PERMeability, the amount of GROUND wood pulp, and to a lesser extent the amount of INK and the BRIGHTness show a weak positive covariation with PRINTHRU through PC1.
Fig 4 Predicted versus measured Y-values in 118 samples
The rest of the variables have a very small contribution to the first principal component (close to the origin in this direction), and we may thus feel motivated to try to remove them to rationalize the measuring. A new model based on only the nine most the important variables actually gives an only slightly worse model. Plotting predicted versus measured Y-values shows good correlation.