Date of Award
Summer 2015
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computing
First Advisor
Kaczmarek, Thomas
Second Advisor
Corliss, George F.
Third Advisor
Ahamed, Sheikh
Abstract
Most organizations process flat files regularly. There are different options for processing files, including SQL Server Integration Services (SSIS), BizTalk, SQL import job, and other Extract, Transform, and Load (ETL) processes. All of these options have very strict requirements for file formats. If the format of the file changes, all of these options throw a catastrophic error, and implementing a fix to handle the new format is difficult. With each of the methods, the new format needs to be configured in the development environment, and the data flow must be modified to process all of the changes. Due to the inflexibility of options in processing flat files, there was a request by Dr. Corliss to build an alternative solution. The team of Ivan Paez, Niharika Jain, and Brandon Krugman created an alternative solution called FileParser. While the solution originally was built to meet the needs of Dr. Corliss and the GasDay team at Marquette University, the end result was a file parser that allows additional flexibility in processing of a variety of flat file formats. This thesis provides an alternative way to parse data, transform a flat file, and consume the data into a generic format; this process is called Provider Processing. Provider File Processing consists of the FileParser command line executable handling the file parsing and data transformation. After FileParser generates a provider output file, a health insurance domain-specific command line executable called DelegatedProviderProcessing performs data cleansing, address normalization, and imports the provider output file into an internal database. The difference between the strict format examples and Provider Processing is that if the format of the input files change, Provider Processing can adapt to the change with minimal work being completed.