Gene Set Enrichment and Projection: A Computational Tool for Knowledge Discovery in Transcriptomes
Date of Award
Doctor of Philosophy (PhD)
Mathematics, Statistics and Computer Science
Krenz, Gary S.
Explaining the mechanism behind a genetic disease involves two phases, collecting and analyzing data associated to the disease, then interpreting those data in the context of biological systems. The objective of this dissertation was to develop a method of integrating complementary datasets surrounding any single biological process, with the goal of presenting the response to a signal in terms of a set of downstream biological effects. This dissertation specifically tests the hypothesis that computational projection methods overlaid with domain expertise can direct research towards relevant systems-level signals underlying complex genetic disease. To this end, I developed a software algorithm named Geneset Enrichment and Projection Displays (GSEPD) that can visualize multidimensional genetic expression to identify the biologically relevant gene sets that are altered in response to a biological process. This dissertation highlights a problem of data interpretation facing the medical research community, and shows how computational sciences can help. By bringing annotation and expression datasets together, a new analytical and software method was produced that helps unravel complicated experimental and biological data. The dissertation shows four coauthored studies where the experts in their field have desired to annotate functional significance to a gene-centric experiment. Using GSEPD to show inherently high dimensional data as a simple colored graph, a subspace vector projection directly calculated how each sample behaves like test conditions. The end-user medical researcher understands their data as a series of somewhat-independent subsystems, and GSEPD provides a dimensionality reduction for high throughput experiments of limited sample size. Gene Ontology analyses are accessible on a sample-to-sample level, and this work highlights not just the expected biological systems, but many annotated results available in vast online databases.