Date of Award
Spring 2010
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Mathematics, Statistics and Computer Science
First Advisor
Corliss, George
Second Advisor
Struble, Craig
Third Advisor
Merrill, Stephen
Abstract
This thesis presents the detection of time series outliers. The data set used in this work is provided by the GasDay Project at Marquette University, which produces mathematical models to predict the consumption of natural gas for Local Distribution Companies (LDCs). Flow with no outliers is required to develop and train accurate models. GasDay is using statistical approaches motivated by normally distributed samples such as the 3 -sigma rule and the 5 -sigma rule to aid the experts in detecting outliers in residuals from the models. However, the Jarque-Bera statistical test shows that the residuals from the GasDay models are not normally distributed.
We present an explanation of Density Based Spatial Clustering of Applications with Noise (DBSCAN) and how it is used to detect time series outliers. We have introduced a new application for the DBSCAN algorithm by adapting it to detect outliers in natural gas flow. The performance of DBSCAN is compared with GasDay's existing technique. Five data sets from temperature-sensitive operating areas with identified outliers and 1000 data sets with synthetic outliers are used in the evaluation process. The 1000 synthetic data sets are prepared using the same empirical distribution as one of the identified data set. This work indicates that DBSCAN has shown some improvement in detecting outliers over GasDays existing technique and merits further exploration.