# Understand chemical data.

### What is chemometrics?

Chemometrics means performing calculations on measurements of chemical data. This can be anything from- calculating pH from a measurement of hydrogen ion activity to computing a Fourier transform interpolation of a spectrum.

More recently, the common usage of the word refers to using linear algebra calculation methods, to make either quantitative or qualitative measurements of chemical data, primarily spectra. Nearly all trained spectroscopists have the basic understanding of the concepts necessary to apply these methods. Unfortunately, like all specialty areas of science, chemometrics has a language all its own that makes it difficult for the beginner to understand.

The science of chemometrics gives spectroscopists many different ways to solve the calibration problem for analysis of spectral data. Some are very simple to understand, while others require a strong background in linear algebra. However, they all have one thing in common: they each solve an individual problem but do not address all possible problems.

Some methods have the advantage of being simple to understand but may not be very robust for all possible samples. Others are very complex to understand and implement, but give solutions that are very stable and can handle a large variety of “unknowns.”

The key to understanding chemometrics is in not necessarily understanding the mathematics of all of the different methods; it is to know which model to use for a given analytical problem and properly applying it.

## Application of Chemometrics

### Exploratory Analysis

• To explore possible outliers and indicate whether there are patterns or trends in the data
• PCA is an important part of chemometrics and provides the most compact representation of all the variation in a data table
• Exploratory algorithms such as principal component analysis (PCA) are designed to reduce large complex data sets into a series of optimized and interpretable size

### Regression

• To predict related properties that are easier to measure
• The goal of chemometric regression analysis is to develop a model which correlates the information in the set of known measurements to the desired property. Chemometric algorithms for performing regression include partial least squares (PLS) and principal component regression (PCR)
• Chemometric regression is extensively used in making decisions relating to product quality in the on-line monitoring and process control industry where fast and expensive systems are needed to test

### Classification

• To assign to predefined categories to samples and predicting an unknown sample as belonging to one of several distinct groups
• A classification model is used to predict a sample’s class based on closest examples. K-nearest neighbour (k-NN) is primarily used in chemometrics
• Chemometrics helps in standardizing data. The classification models are more reliable and include the ability to reveal unusual samples in the data

### Chemometrics is typically used for:

• Spectroscopic calibrations
• Process modeling for optimisation
• Process models for monitoring and fault detection
• Dynamic model identification for process control
• Multivariate statistical process control
• Process analytical instrument standardisation
• Analytical instrument design and development