Outliers are a common problem in industrial data sets. In fact, the presence of outliers is more the norm than the exception. These unusual, often “erroneous” observations heavily affect the classical estimates of data mean, variance and covariance. They also, of course, greatly influence regression and other machine learning models. Without proper treatment, the resulting data models are not an accurate representation of the bulk of the data. Alternately, outlier samples are sometimes the most interesting samples in a data set, revealing unique properties or trends. If these samples are not identified, opportunities for discovery can be missed.
Robust Methods deal with the problem of outliers by determining which samples represent the “consensus” in the data and base the models on those samples, while ignoring the outliers. The course starts with methods for robust estimation of the mean and variance/covariance and goes on to methods for robust Principal Components and Partial Least Squares regression. Hands-on exercises will be done using MATLAB and PLS_Toolbox/Solo.