Creation of a Computational Pipeline to Extract Genes from Quantitative Trait Loci for Diabetes and Obesity
Date of Award
Master of Science (MS)
Solberg Woods, Leah
Type 2 Diabetes is a disease of relative insulin deficiency resulting from a combination of insulin resistance and decreased beta-cell function. Over the past several years, over 60 genes have been identified for Type 2 Diabetes in human genome-wide association studies (GWAS). It is important to understand the genetics involved with Type 2 diabetes in order to improve treatment and understand underlying molecular mechanisms. Heterogeneous stock (HS) rats are derived from 8 inbred founder strains and are powerful tools for genetic studies because they provide a basis for high resolution mapping of quantitative trait loci (QTL) in a relatively short time period. By measuring diabetic traits in 1090 HS male rats and genotyping 10K single nucleotide polymorphisms (SNPs) within these rats, Dr. Solberg Woods' lab conducted genetic analysis to identify 85 QTL for diabetes and adiposity traits. To identify candidate genes within these QTL, we propose creation of a bioinformatics pipeline that combines general gene information, information from the rat genome database including disease portals and Variant Visualizer as well as the Attie Diabetes Expression Database. My project has involved writing code to pull data from these databases to determine which genes within each QTL are potential candidate genes. I have scripted the code to analyze genes within a single QTL or multiple QTL simultaneously. The resulting output is a single excel file for each QTL, listing all genes that are found in the disease portals, all genes that have a highly conserved non-synonymous variant change and all genes that are differentially expressed in the Attie database. The program also highlights genes that are found in all three categories. After creating the pipeline, I ran the program for 85 QTL identified in my laboratory. The program identified 63 high priority candidate genes for future follow-up. This work has helped my laboratory rapidly identify candidate genes for type 2 diabetes and obesity. In the future, the code can be modified to identify candidate genes within QTL for any complex trait.