Paper: Extracting The Names Of Genes And Gene Products With A Hidden Markov Model

ACL ID C00-1030
Title Extracting The Names Of Genes And Gene Products With A Hidden Markov Model
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
Authors

~e report the results of a study into the use of a linear interpolating hidden Marker model (HMM) for the task of extra.('ting lxw]mi(:al |;er- minology fl:om MEDLINE al)stra('ts and texl;s in the molecular-bioh)gy domain. Tiffs is the first stage isl a. system that will exl;ra('l; evenl; information for automatically ut)da.ting 1)ioh)gy databases. We trained the HMM entirely with 1)igrams based (m lexical and character fea- tures in a relatively small corpus of 100 MED- LINE abstract;s that were ma.rked-ul) l)y (lo- main experts wil;h term (:lasses su(:h as t)rol;eins and DNA. I.Jsing cross-validation methods we a(:]fieved a,n ].e-score of 0.73 and we (',xmnine the ('ontrilmtion made by each 1)art of the interl)o- lation model to overconfing (la.ta Sl)arsen('.ss.