Classification of Unknown Lubricant Samples

The classification schemes developed in this database used rigorous validated statistical protocols to aid in providing objective interpretations of the data. Unsupervised and supervised statistical techniques were used to evaluate the data collected from the sampled lubricants and identify unique classes based on their chemical composition and not how they are commercially marketed. This should assist examiners in their visual assessment of lubricants or in questioned vs. known comparison obtained in sexual assault investigations.

Method of Classifying Database Samples

The method of developing the classification schema presented herein is provided below:

Unsupervised hierarchal cluster analysis (HCA) was initially utilized to group similar samples based on their corresponding chemical profile.
Principal component analysis was then utilized to reduce the dimensionality of the data by generating latent orthogonal pseudo-variables known as principal components. A small number of these principal components were then used to model and reconstruct the dataset. The magnitude of the scores generated were used to assess the similarity within and between samples, while the loadings were used to determine correlations between the original variables and the samples.
A supervised classification technique, linear discriminant analysis, was used to evaluate the PCA models which were then used to predict the classification of independent test samples. An estimation of the efficacy of the models was made by comparing the predicted and actual classifications of the samples. All of the classification models developed herein had a correct classification rate greater than 95%.

A fused dataset was initially attempted to provide one classification scheme for all of the lubricant samples, regardless of the instrument that was used for analysis. However, despite normalizing the data, the classes and subclasses observed were entirely dependent upon the DART-HRMS dataset, primarily due to the significant number of data present in the DART-HRMS spectra compared to FTIR and GC-MS combined. Therefore, it was decided to provide independent classification scheme based on the extraction process (i.e. neat, hexane, methanol) and analytical instrumentation.

Explanation of Datbase Sample Classification

Primary sample classes are based on the FTIR data that indicates the primary lubricant base in the sample. The subclasses are based on the DART-HRMS data which provided the most unique and informative classes. This was especially true since several of the lubricant samples did not generate a GC-MS spectrum.

A summary of the results from the datasets is provided on the combined classification page. This classification scheme provides the FTIR class and the DART-HRMS or GC-MS subclass of data. We are still developing accurate descriptions for the classes within the Scientific Working Group of Sexual Lubricants (SWGLube). This combined classification scheme is provided for neat sample analysis as well as the analysis of the hexane and methanol extracts.

Use of the Classification Scheme

The individual instrument classification scheme pages will provide the classes or subclasses observed for each sample preparation method. Additionally, it will provide which samples in the Sexual Lubricant Database fall within that class or subclass, and the peaks that are unique to that class. Therefore, when analyzing your samples in your laboratory, you can compare not only the analytical spectra for similarity but also the unique peaks that should be present for your sample to fall within that class.

If you have any questions, please email the Sexual Assault Database Admin at cbridge@knights.ucf.edu.