Knowledge Center provides beginners and interested professional with basic, essential information on statistics, chemometrics, spectroscopy and Multivariate Analysis, to enhance their knowledge and facilitate an easier understanding of the subjects.
A sensory panel may be described as a group of testers who have exceptional sensory faculties and can describe products on the basis of taste, smell or feel. More on Sensory Panels
Multivariate Curve Resolution is defined as a group of techniques that help resolve mixtures by determining the number of constituents, their profiles and their estimated concentrations. More on MCR
Literally (and loosely) translated, the word "chemometrics" means performing calculations on measurements of chemical data. This can be anything from- calculating pH from a measurement of hydrogen ion activity to computing a Fourier transform interpolation of a spectrum. More on Chemometrics
Contrary to regression, which predicts the values of one or several quantitative variables, classification is useful when the response is, a category variable that can be interpreted in terms of several classes to which a sample may belong.
The main goal of classification is to reliably assign new samples to existing classes (in a given population).
Note that classification is not the same as clustering.
You can also use classification results as a diagnostic tool:
It follows that, examples of such situations are:
The SIMCA classification is based on making a PCA model for each class in the training set. Unknown samples are then compared to the class models and assigned to classes, according to their analogy to the training samples.
Solving a classification problem requires two steps:
The modeling stage implies that you have identified enough samples as members of each class to be able to build a reliable model. It also requires enough variables to describe the samples accurately.
The actual classification stage uses significance tests, where the decisions are based on statistical tests performed on the object-to-model distances .
Experimental design is a strategy to gather empirical knowledge, i.e. knowledge based on the analysis of experimental data and not on theoretical models. It can be applied whenever you intend to investigate a phenomenon in order to gain understanding or improve performance. ...... ....... [More on Design of Experiment]
Model Validation means checking how well the model will perform on new data.
A regression model is usually made to do predictions in the future. The validation of the model, estimates the uncertainty of such future predictions. If the uncertainty is reasonably low, the model can be considered valid.
The same argument applies to a descriptive multivariate analysis such as PCA: If you want to extrapolate the correlations observed in your data table to future, similar data, you should check whether they still apply for new data.
Prediction (computation of unknown response values using a regression model) is the purpose of most regression applications.
Prediction consists in feeding observed X-values for new samples into a regression model so as to obtain computed (predicted) Y-values.
The main results of prediction include Predicted Y-values and Deviations. They can be displayed as plots. In addition, warnings are computed and help you detect outlying samples or individual values of some variables.
The Predicted with Deviation plot shows the predicted Y-values for all samples, together with a deviation, which expresses how similar the prediction sample is to the calibration samples used when building the model; similar the sample, smaller the deviation. Predicted Y-values, for samples with high deviations cannot be trusted.
For each sample, the deviation (which is a kind of 95% confidence interval around the predicted Y-value) is computed as a function of the sample’s leverage and its X-residual variance. This is a 2-D scatter plot of Predicted Y-values vs. Reference Y-values. It has the same features as a Predicted vs. Measured plot.
Here are a few examples:
Introducing changes in the values of your variables, e.g. so as to make them better suited for an analysis, is called pre-processing. One may also talk about applying a pre-treatment or a transformation.
Here are a few examples:
Regression is a generic term for all methods attempting to fit a model to observed data in order to quantify the relationship between two groups of variables. The fitted model may then be used either to merely describe the relationship between the two groups of variables, or to predict new values. More on Statistical Regression Analysis
PLS Regression is a recent technique that generalizes and combines features from Principal Component Analysis and Multiple Regression. It is particularly useful when we need to predict a set of dependent variables from a (very) large set of independent variables (i.e., predictors).
Partial Least Squares (PLS) can be a powerful method of analysis because of the minimal demands on measurement scales, sample size, and residual distributions. More on PLS Regression
Large data tables usually contain a large amount of information, which is partly hidden because the data are too complex to be easily interpreted. Principal Component Analysis (PCA) is a projection method that helps you visualize all the information contained in a data table. More on Principal Component Analysis
Spectroscopy is a technique that uses the interaction of energy with a sample to perform an analysis. [More on Spectroscopy]
If you have any further questions/ comments, please send a mail to: firstname.lastname@example.org