Source PaperYearLineSentence
P06-1011 2006 175
Much of the work involving comparable corpora has focused on extracting word translations (Fung and Yee, 1998; Rapp, 1999; Diab and Finch, 2000; Koehn and Knight, 2000; Gaussier et al, 2004; Shao and Ng, 2004; Shinyama and Sekine, 2004)
N09-1048 2009 48
Shao andNg (2004) presented a method to mine new transla tions from Chinese and English news documents ofthe same period from different news agencies, com bining both transliteration and context information.Kuo et al (2006) used active learning and unsu pervised learning for mining transliteration lexicon from the Web pages, in which an EM process was used for estimating the phonetic similarities between English syllables and Chinese characters
N09-1048 2009 13
Theother is multilingual parallel and comparable corpora (e.g., Wikipedia1), wherein features such as cooccurrence frequency and context are popularly em ployed (Cheng et al, 2004; Shao and Ng, 2004; Cao et al, 2007; Lin et al, 2008).In this paper, we focus on a special type of com parable corpus, parenthetical translations
C10-1070 2010 79
A somehow 3For instance, O21 stands for the number of windows containing S but not s.generalized version of this heuristic has been de scribed in (Shao and Ng, 2004)
D10-1042 2010 55
To alleviate such scarcity, (Fung and Yee, 4321998; Shao and Ng, 2004) explore a more vast resource of comparable corpora, which share no parallel document- or sentence-alignments as in parallel corpora but describe similar contents in two languages, e.g., news articles on the same event
C10-2164 2010 18
At present, the methods for OOV term transla tion have changed from the basic pattern based on bilingual dictionary, transliteration or paral lel corpus to the intermediate pattern based on comparable corpus (Lee et al, 2006; Shao and Ng, 2004; Virga and Khudanpur, 2003), and 1435then become a new pattern based on Web min ing (Fang et al, 2006; Sproat et al, 2006)
W11-1215 2011 37
Some recent research used comparable corpora to re-score name translitera tions (Sproat et al, 2006; Klementiev and Roth, 2006) or mine new word translations (Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Tao and Zhai, 2005; Hassan et al, 2007; Udupa etal., 2009; Ji, 2009)
W11-2206 2011 243
Some recent research used comparable corpora to re-scorename transliterations (Sproat et al, 2006; Klementiev and Roth, 2006) or mine new word transla tions (Udupa et al, 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Hassan et al, 2007)
D12-1003 2012 36
Various correlation mea sures have been used: log-likelihood ratio (Rapp, 1999; Chiao and Zweigenbaum, 2002), tf-idf (Fung and Yee, 1998), pointwise mutual information(PMI) (Andrade et al2010), context heterogene ity (Fung, 1995), etc. Shao and Ng (2004) represented contexts using language models
D12-1003 2012 65
Various clues have been con sidered when computing the similarities: conceptclass information obtained from a multilingual the saurus (De?jean et al2002), co-occurrence models generated from aligned documents (Prochasson and Fung, 2011), and transliteration information (Shao and Ng, 2004)
P13-1059 2013 207
Some recent research used compara ble corpora to mine name translation pairs (Feng et al, 2004; Kutsumi et al, 2004; Udupa et al, 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng, 2004; Lu and Zhao, 2006; Hassan et al, 2007)
P13-1062 2013 191
We processed news articles for an entire year in 2008 by Xinhua news who publishes news in both English and Chinese, which were also used by Kim et al (2011) and Shao and Ng (2004)
P13-1107 2013 240
Other similar research lines are the TACKBP Entity Linking (EL) (Ji et al, 2010; Ji et al, 2011),which links a named entity in news and web documents to an appropriate knowledge base (KB) en try, the task of mining name translation pairs from comparable corpora (Udupa et al, 2009; Ji, 2009; Fung and Yee, 1998; Rapp, 1999; Shao and Ng,2004; Hassan et al, 2007) and the link prediction problem (Adamic and Adar, 2001; Liben Nowell and Kleinberg, 2003; Sun et al, 2011b; Hasan et al, 2006; Wang et al, 2007; Sun et al,2011a)
P13-2036 2013 9
Recently, holistic approaches combining such similarities have been studied (Shao and Ng, 2004; You et al, 2010; Kim et al, 2011)
P13-2036 2013 10
(Shao and Ng, 2004) rank translation candidates using PH and CX independently and return results with the highest average rank