Date of Award
Thesis - Restricted
Master of Science (MS)
Mathematics, Statistics and Computer Science
Struble, Craig A.
Twigger, Simon N.
Ahamed, Sheikh I.
Information in genome databases is maintained and updated by curators, ensuring that it is current and authentic. To achieve this goal, curators refer to research articles to refine the scientific knowledge stored in these databases, as this literature is an important source for such information. Curators have to pick papers relevant to the database they are maintaining from the literature. The vastly growing literature makes it a challenge to find crucial and relevant information, making curators fall behind the latest publications. The identification of papers relevant to a particular subject is an example of text categorization. In this research we focus on creating a web based software tool that utilizes support vector machines (SVM) as a classifier. The SVM classifies papers as relevant or irrelevant by categorizing text from abstracts. By creating software tools that implement text categorization algorithms, biomedical literature can be more effectively curated. Software tools that can help curators with the task of selecting highly relevant papers out of the large volume of literature would greatly benefit the curation process. This tool achieves an average accuracy of 94.45% and precision and recall of 96.34% and 94.74% respectively when classifying papers relevant to needs of the Rat Genome Database (RGD).
Marur, Vasant R., "An SVM Based Tool for the Curation of Biomedical Literature" (2005). Master's Theses (1922-2009) Access restricted to Marquette Campus. 2157.