Format of Original
CNIO Centro Nacional de Investigaciones Oncológicas
Proceedings of the Second Biocreative Challenge Evaluation Workshop
In this paper, we propose the use of character n-gram and multiple conditional random field (CRF) models for BioCreAtIvE 2 Task 1, gene/protein name recognition. We investigated different state transition weighting schemes for CRFs and discovered that models provided independent nonoverlapping mentions. To improve recall, the results of multiple models are combined. To improve precision, character n-gram models classify gene/protein mention containing sentences. Our best approach achieved a precision of 84.35%, recall of 81.39% and F-measure of 82.85%.
Struble, Craig; Povinelli, Richard J.; Johnson, Michael T.; Berchanskiy, Dina; Tao, Jidong; and Trawicki, Marek B., "Combined Conditional Random Fields and n-Gram Language Models for Gene Mention Recognition" (2007). Electrical and Computer Engineering Faculty Research and Publications. 147.