Knowledge Center provides beginners and interested professional with basic, essential information on statistics, chemometrics, spectroscopy and Multivariate Analysis, to enhance their knowledge and facilitate an easier understanding of the subjects.

- What is a Sensory Panel?
- What is Multivariate Curve Resolution(MCR)?
- What is Chemometrics
- What is Classification ? What are its uses ?
- What is Design of Experiments (DoE)) ?
- What is Model Validation ?
- What is Prediction ?
- What Is Re-formatting ?
- What Is Pre-processing ?
- What is Regression Analysis ?
- What is PLS- Regression?
- What is Principal Component Analysis ?
- What is Spectroscopy ?

A sensory panel may be described as a group of testers who have exceptional sensory faculties and can describe products on the basis of taste, smell or feel. More on Sensory Panels

Multivariate Curve Resolution is defined as a group of techniques that help resolve mixtures by determining the number of constituents, their profiles and their estimated concentrations. More on MCR

Literally (and loosely) translated, the word "chemometrics" means performing calculations on measurements of chemical data. This can be anything from- calculating pH from a measurement of hydrogen ion activity to computing a Fourier transform interpolation of a spectrum. More on Chemometrics

Contrary to regression, which predicts the values of one or several quantitative variables, ** classification **is useful when the response is, a category variable that can be interpreted in terms of several classes to which a sample may belong.

The main goal of classification is to reliably assign new samples to existing classes (in a given population).

Note that ** classification **is not the same as ** clustering. **

You can also use classification results as a diagnostic tool:

- To distinguish among the most important variables to keep in a model (variables that “characterize” the population);
- Or to find outliers (samples that are not typical of the population).

It follows that, examples of such situations are:

- Predicting whether a product meets
**quality requirements**, where the result is simply “Yes” or “No” (i.e. binary response). - Modeling various close
**species of plants or animals**according to their easily observable characteristics, so as to be able to decide whether new individuals belong to one of the modeled species. - Modeling various
**diseases according**to a set of easily observable symptoms, clinical signs or biological parameters, so as to help future diagnosis of those diseases.

The SIMCA classification is based on making a PCA model for each class in the training set. Unknown samples are then compared to the class models and assigned to classes, according to their analogy to the training samples.

Solving a classification problem requires two steps:

- Modeling: Build one separate model for each class;
- Classifying new samples: Fit each sample to each model and decide whether the sample belongs to the corresponding class.

The modeling stage implies that you have identified enough samples as members of each class to be able to build a reliable model. It also requires enough variables to describe the samples accurately.

The actual classification stage uses significance tests, where the decisions are based on statistical tests performed on the object-to-model distances .

Experimental design is a strategy to gather empirical knowledge, i.e. knowledge based on the analysis of experimental data and not on theoretical models. It can be applied whenever you intend to investigate a phenomenon in order to gain understanding or improve performance. ...... ....... [More on Design of Experiment]

Model Validation means checking how well the model will perform on new data.

A regression model is usually made to do predictions in the future. The validation of the model, estimates the uncertainty of such future predictions. If the uncertainty is reasonably low, the model can be considered valid.

The same argument applies to a descriptive multivariate analysis such as PCA: If you want to extrapolate the correlations observed in your data table to future, similar data, you should check whether they still apply for new data.

** **Prediction (computation of unknown response values using a regression model) is the purpose of most regression applications.

Prediction consists in feeding observed X-values for new samples into a regression model so as to obtain computed (predicted) Y-values.

The main results of prediction include Predicted Y-values and Deviations. They can be displayed as plots. In addition, warnings are computed and help you detect outlying samples or individual values of some variables.

The Predicted with Deviation plot shows the predicted Y-values for all samples, together with a deviation, which expresses how similar the prediction sample is to the calibration samples used when building the model; similar the sample, smaller the deviation. Predicted Y-values, for samples with high deviations cannot be trusted.

For each sample, the deviation (which is a kind of 95% confidence interval around the predicted Y-value) is computed as a function of the sample’s leverage and its X-residual variance. This is a 2-D scatter plot of Predicted Y-values vs. Reference Y-values. It has the same features as a Predicted vs. Measured plot.

**Here are a few examples**:

- Get a better overview of the contents of your data table by
**sorting variables**or samples. - Change point of view: by
**transposing a**data table, samples become variables and vice-versa. - Apply a 2-D analysis method to 3-D data: by
**unfolding a**three-way data array, you enable the use of e.g. PCA on your data.

** **Introducing changes in the values of your variables, e.g. so as to make them better suited for an analysis, is called ** pre-processing**. One may also talk about applying a ** pre-treatment or a transformation**.

**Here are a few examples: **

- Normalize the distribution of a skewed variable by taking its logarithm.
- Remove some noise in your spectra by smoothing the curves.
- Improve the precision in your sensory assessments by taking the average of the sensory ratings over all panelists.

Regression is a generic term for all methods attempting to fit a model to observed data in order to *quantify the relationship* between two groups of variables. The fitted model may then be used either to merely *describe* the relationship between the two groups of variables, or to *predict* new values. More on Statistical Regression Analysis

PLS Regression is a recent technique that generalizes and combines features from Principal Component Analysis and Multiple Regression. It is particularly useful when we need to predict a set of dependent variables from a (very) large set of independent variables (i.e., predictors).

Partial Least Squares (PLS) can be a powerful method of analysis because of the minimal demands on measurement scales, sample size, and residual distributions. More on PLS Regression

Large data tables usually contain a large amount of information, which is partly hidden because the data are too complex to be easily interpreted. **Principal Component Analysis** (PCA) is a projection method that helps you visualize all the information contained in a data table. More on Principal Component Analysis

Spectroscopy is a technique that uses the interaction of energy with a sample to perform an analysis. [More on Spectroscopy]

If you have any further questions/ comments, please send a mail to: support@camo.com