Date of Award
Summer 2005
Document Type
Thesis - Restricted
Degree Name
Master of Science (MS)
Department
Mathematics, Statistics and Computer Science
First Advisor
Struble, Craig A.
Second Advisor
Twigger, Simon N.
Third Advisor
Ahamed, Sheikh I.
Abstract
Information in genome databases is maintained and updated by curators, ensuring that it is current and authentic. To achieve this goal, curators refer to research articles to refine the scientific knowledge stored in these databases, as this literature is an important source for such information. Curators have to pick papers relevant to the database they are maintaining from the literature. The vastly growing literature makes it a challenge to find crucial and relevant information, making curators fall behind the latest publications. The identification of papers relevant to a particular subject is an example of text categorization. In this research we focus on creating a web based software tool that utilizes support vector machines (SVM) as a classifier. The SVM classifies papers as relevant or irrelevant by categorizing text from abstracts. By creating software tools that implement text categorization algorithms, biomedical literature can be more effectively curated. Software tools that can help curators with the task of selecting highly relevant papers out of the large volume of literature would greatly benefit the curation process. This tool achieves an average accuracy of 94.45% and precision and recall of 96.34% and 94.74% respectively when classifying papers relevant to needs of the Rat Genome Database (RGD).
Recommended Citation
Marur, Vasant R., "An SVM Based Tool for the Curation of Biomedical Literature" (2005). Master's Theses (1922-2009) Access restricted to Marquette Campus. 2157.
https://epublications.marquette.edu/theses/2157