Paper: A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora

ACL ID D09-1013
Title A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

Because of the importance of protein- protein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of pro- teins and PPI. Since no single corpus is large enough to saturate a machine learn- ing system, it is necessary to learn from multiple different corpora. In this paper, we propose a solution to this challenge. We designed a rich feature vector, and we applied a support vector machine modi- fied for corpus weighting (SVM-CW) to complete the task of multiple corpora PPI extraction. The rich feature vector, made from multiple useful kernels, is used to express the important information for PPI extraction, and the system with our fea- ture vector was shown to be both faster and more accurate than the original kernel- based system, even...