Date of Award

Fall 2011

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Bioinformatics

First Advisor

Struble, Craig A.

Second Advisor

Mitchell, Aoy

Third Advisor

Madiraju, Praveen

Abstract

Human genetic variation occurs more commonly than was recognized after the completion of the Human Genome Sequencing Project in 2003. Submicroscopic human DNA analysis has revealed copy number variation (CNV) as the deletion or duplication of a genomic region potentially affecting gene dosage. Advanced genetic research now includes the study of CNVs in diseased subject groups compared to in house controls or online published datasets of control CNV data. Research labs choose from different bioinformatic algorithms to make the copy number calls. Solutions for further processing the copy number data into quantifiable form require collaboration with data analysts and include the use of relational databases.

The aim of this thesis work was to develop a relational database solution for human copy number variation in subjects with cardiac malformations. The multipurpose database served as a central repository for the cohort demographic data as well as the entire experimental set of copy number variant data. Quantification and frequency analyses of the CNVs were executed via SQL queries. Database SQL queries generated raw data used for essential visualization tools including a detailed subject profile and a one hundred gene CNV spectra.

The stated purpose of the study was to develop a descriptive analysis of genomic copy number associations in a well phenotyped congenital heart disease (CHD) population over one hundred disease associated genes. The relational database created to advance the research proved valuable in its data storage and retrieval capacity. Results showing consistency with published literature validated the accuracy of the query results generated for the CHD cohort.

COinS