Date of Award

Fall 2003

Degree Type

Thesis - Restricted

Degree Name

Master of Science (MS)


Mathematics, Statistics and Computer Science

First Advisor

Struble, Craig A.

Second Advisor

Merrill, Stephen J.

Third Advisor

Chen, Chin-Fu


The Hwnan Genome Project has reached its completion and plenty of genes and expressed sequence tags have been identified. However, the function, expression, and regulation of more than 80% of the genes have yet to be explored. Exploring is best done systematically. The genome, representing the complete blueprint of the organism, is the natural bounded system in which to conduct this exploration. DNA microarrays provide a natural means for exploring the genome in a way that is both systematic and comprehensive. The function of a gene can be explored by determining its pattern of expression. The set of genes expressed in a cell determines what the cell is made of, what biochemical and regulatory systems are operative. As we learn to infer the biological consequences of specific features of gene expression patterns, we can use microarrayto see a comprehensive, dynamic molecular picture of the living cell. Underlying the microarrayexperiments is the notion that analyzing the response of a system to a given perturbation can shed light on the mechanism of signaling or biological response to the perturbation, or both, at the gene expression level. The complexity of microarray data provides new challenges for data mining to identify and validate patterns that are biologically relevant. Singular value decomposition (SVD) is one approach for analyzing gene expression data. We use an integrative approach and investigate the claim that SVD elucidates patterns representing biological processes by annotating these patterns with biological process tenns contained in the Gene Ontology (GO) database, which is a dynamic, controlled vocabulary that can be applied to all eukaryotes. We present a procedure using statistical measures to classify genes involved in distinct regulatory biological processes that are statistically significant, and biologically interpretable from a systems perspective. Our approach paves a way in understanding regulatory and other complex biological processes from the molecular level to the systems level. Keywords: Singular Value Decomposition (SVD), Gene Ontology (GO), Statistics, Biological Process, Microarray Gene Expression, Data Mining