Source PaperYearLineSentence
P08-1099 2008 142
We use the CLASSIFIEDS data provided by Grenager et al (2005) and compare with results reported by HK06 (Haghighi and Klein, 2006) and CRR07 (Chang et al, 2007)
P08-1099 2008 40
Another recent method that has been proposed for training sequence models with constraints is Chang et al (2007)
P08-1099 2008 24
We achieve competitive performancein comparison to alternate model families, in partic ular generative models such as MRFs trained with EM (Haghighi and Klein, 2006) and HMMs trained with soft constraints (Chang et al, 2007)
W09-2201 2009 51
Chang et al (2007) present a framework for learning that optimizes the data likelihood plus constraint-based penalty terms than capture prior knowledge, and demonstrate it with semi-supervised learning of segmentation models
D09-1134 2009 206
We use the CLASSIFIEDS data provided byGrenager et al (2005) and compare with re sults reported by CRR07 (Chang et al, 2007) andMM08 (Mann and McCallum, 2008) for both su pervised and semi-supervised learning
D09-1134 2009 31
Constraint-driven learning (Chang et al, 2007) expresses several kinds of constraints in a unifiedform
D09-1134 2009 23
1289 We compare our CRF model integrated with VEwith two state-of-the-art models, i.e., constraintdriven learning (Chang et al, 2007) and gener alized expectation criteria (Mann and McCallum, 2008)
D09-1014 2009 298
Chang et al (2007) use beam search for decoding unlabeled text with soft and hard constraints, and train amodel with top-K decoded label sequences
D09-1009 2009 258
Chang et al (2007) only obtain better results than 88.2% on cora when using 300 labeled examples(two hours of estimated annotation time), 5000 additional unlabeled examples, and extra test time inference constraints
D09-1009 2009 135
Chang et al (2007) present an algorithm for learning with constraints, but this method requires users to set weights by hand
N09-1034 2009 43
This has been shown both in supervised settings (Roth and Yih, 2004; Riedel and Clarke, 2006) and unsupervised settings (Haghighi and Klein, 2006; Chang et al, 2007) in which constraints are used to bootstrap themodel (self citation)
N09-1034 2009 44
(Chang et al, 2007) describes an unsuper vised training of a Constrained Conditional Model(CCM), a general framework for combining statisti cal models with declarative constraints (self citation)
D10-1120 2010 24
Learning with Linguistic Constraints Our workis situated within a broader class of unsupervised approaches that employ declarative knowledge to im prove learning of linguistic structure (Haghighi and Klein, 2006; Chang et al, 2007; Grac?a et al, 2007; Cohen and Smith, 2009b; Druck et al, 2009; Lianget al, 2009a)
N10-1009 2010 233
These include methods thatexpress domain knowledge as constraints on fea tures, which have shown to provide high accuracy on natural language datasets (Chang et al, 2007; Chang et al, 2008; Mann and McCallum, 2008; Bellare et al, 2009; Singh et al, 2010)
N10-1009 2010 58
Alternating Projections Recent work in semi-supervised learning uses constraints as external supervision (Chang et al, 2007; Mann and McCallum, 2008; Bellare et al, 2009; Singh et al, 2010)
N10-1009 2010 25
Furthermore, recent work on constraint-based semisupervised learning allows domain experts to eas ily provide additional light supervision, enabling the learning algorithm to learn using the prior domain knowledge, labeled and unlabeled data (Chang et al., 2007; Mann and McCallum, 2008; Bellare et al, 2009; Singh et al, 2010).Prior domain knowledge, if it can be easily expressed and incorporated into the learning algorithm, can often be a high-quality and cheap sub stitute for labeled data
N10-1111 2010 62
Chang et al propose constraint-driven learning (CODL, Chang et al, 2007) which can be in terpreted as a variation of self-training: Instances are selected for supervision based not only on the model?s prediction, but also on their consistencywith a set of user-defined constraints
N10-1111 2010 76
We use the same token label constraints as Chang et al (2007).We use the objective functions defined in Sec tion 3, specifically self-training (Self:Fs), direct constraints (Cons:Fc), the combination of the two (Self+Cons:Fsc), and combination of the model score and the constraints (Model+Cons:Fmc)
N10-1111 2010 79
We also report supervised results from (Chang et al, 2007) andSampleRank
N10-1111 2010 46
Constraint-driven semi-supervised learning usesconstraints to incorporate external domain knowl edge when labels are missing (Chang et al, 2007; Mann and McCallum, 2008; Bellare et al, 2009)
N10-1111 2010 13
Most semi-supervised learning algorithms rely on marginals (GE, Mann and McCallum, 2008) or MAP assignments (CODL, Chang et al, 2007)
N10-1111 2010 61
corresponds to constraint satisfaction weights ? used in (Chang et al, 2007)
C10-2137 2010 226
This is termed constraint-driven learning in (Chang et al., 2007), coupled learning in (Carlson et al, 2010) and counter-training in (Yangarber, 2003)
D11-1017 2011 215
Our method can be considered an instance of weakly or distantly supervised structured prediction (Chang et al, 2007; Chang et al, 2010; Clarke et al, 2010; Ganchev et al, 2010)
D11-1117 2011 155
13Some authors (Nigam and Ghani, 2000; Ng and Cardie, 2003; Smith and Eisner, 2005a, ?5.2, 7; ?2; ?6) draw a hard line between bootstrapping algorithms, such as self- and co-training, and probabilistic modeling using EM; others (Dasgupta et al, 2001; Chang et al, 2007, ?1; ?5) tend to lump them together
D11-1006 2011 28
We use the augmented-loss learn ing procedure (Hall et al, 2011) which is closely related to constraint driven learning (Chang et al,2007; Chang et al, 2010)
D11-1006 2011 169
The learning algorithm in Figure 2 is an instance of augmented-loss training (Hall et al, 2011) whichis closely related to the constraint driven learning algorithms of Chang et al (2007)
D11-1138 2011 243
The work that is most similar to ours is thatof Chang et al (2007), who introduced the Constraint Driven Learning algorithm (CODL)
D11-1138 2011 21
This includes work on generalized expectation (Mann and McCallum, 2010), posterior regularization (Ganchev et al, 2010) and constraint driven learning (Chang et al, 2007; Chang et al, 2010)
D11-1138 2011 22
The work of Chang et al (2007) on constraintdriven learning is perhaps the closest to our frame work and we draw connections to it in Section 5
P11-1054 2011 47
Chang et al (2007) propose an objective function for semi-supervised extraction that bal ances likelihood of labeled instances and constraint violation on unlabeled instances
P11-1149 2011 267
Our unsupervised approach follows a self training protocol (Yarowsky, 1995; McClosky et al, 2006;Reichart and Rappoport, 2007b) enhanced with con straints restricting the output space (Chang et al,2007; Chang et al, 2009) (self citation)
P11-5005 2011 49
Constraint driven learning (CoDL) was first introduced in Chang et al [2007], and has been used also in Chang et al [2008]
D12-1099 2012 49
(Chang et al 2007) incorpo rates domain specific constraints in semi-supervisedlearning
D12-1102 2012 51
Thus,our work is applicable not only in cases where in ference is done after a separate learning phase, as in (Roth and Yih, 2004; Clarke and Lapata, 2006; Roth and Yih, 2007) and others, but also when inference is done during the training phase, for algorithms like 1115the structured perceptron of (Collins, 2002), structured SVM (Tsochantaridis et al 2005) or the con straints driven learning approach of (Chang et al 2007) (self citation)
D12-1120 2012 129
Several machine learning paradigms have been developed recently for incorporating biases and con straints into parameter estimation (Liang et al 2009; Chang et al 2007; Mann and McCallum, 2007)
N12-1008 2012 43
Global Constraints Previous work demonstratedthe benefits of applying declarative constraints in in formation extraction (Finkel et al, 2005; Roth andtau Yih, 2004; Chang et al, 2007; Druck and Mc Callum, 2010)
N12-1008 2012 46
Likewise, Chang et al (2007) use con straints at multiple levels, such as sentence-level constraints to specify field boundaries and global constraints to ensure relation-level consistency
N12-1053 2012 251
Chang et al (2007) use a set of domain specificrules as automatic implicit feedback for training in formation extraction system
N12-1087 2012 2
UEM is parameterizedby a single parameter and covers existing algorithms like standard EM and hard EM, constrained versions of EM such as ConstraintDriven Learning (Chang et al, 2007) and Pos terior Regularization (Ganchev et al, 2010), along with a range of new EM algorithms (self citation)
N12-1087 2012 7
Many successful applications of unsupervisedand semi-supervised learning in NLP use EM in cluding text classification (McCallum et al, 1998; Nigam et al, 2000), machine translation (Brown et al., 1993), and parsing (Klein and Manning, 2004).Recently, EM algorithms which incorporate constraints on structured output spaces have been pro posed (Chang et al, 2007; Ganchev et al, 2010) (self citation)
N12-1087 2012 10
The same issuecontinues in the presence of constraints where Posterior Regularization (PR) (Ganchev et al, 2010) cor responds to EM while Constraint-Driven Learning (CoDL)1 (Chang et al, 2007) corresponds to hard EM (self citation)
N12-1087 2012 25
1To be more precise, (Chang et al, 2007) mentioned using hard constraints as well as soft constraints in EM (self citation)
N12-1087 2012 96
Standard EM Deterministic Annealing EM Unconstrained CoDL (Chang et al., 2007) (NEW) EM with Lin (self citation)
N12-1087 2012 60
In the con 689text of EM, constraints can be imposed on the posterior probabilities, q, to guide the learning proce dure (Chang et al, 2007; Ganchev et al, 2010) (self citation)
N12-1087 2012 64
, bm], we write down the set of all feasible2 structures as {h | h ? H(x),Uh ? b} . Constraint-Driven Learning (CoDL) (Chang et al., 2007) augments the E-step of hard EM (4) by imposing these constraints on the outputs.Constraints on structures can be relaxed to expec tation constraints by requiring the distribution q tosatisfy them only in expectation (self citation)
P12-1068 2012 125
Space restrictions prevent us from discussing the close relation between this penalty formulation andthe existing work on injecting prior and side information in learning objectives in the form of con straints (McCallum et al, 2007; Ganchev et al, 2010; Chang et al, 2007).In order to support efficient and parallelizable inference, we simplify the above penalty by considering only disjoint pairs of predicates, instead of sum ming over all pairs p(1) and p(2)
P12-1088 2012 180
Note that the objective function in Equation 5, ifwritten in the additive form, leads to a cost func tion reminiscent of the one used in constraint-driven learning algorithm (CoDL) (Chang et al, 2007) (and similarly, posterior regularization (Ganchev et al, 2010), which we will discuss later at Section 6) (self citation)
P12-1088 2012 291
Constraint-driven learning (CoDL) (Chang et al, 2007) and posterior regularization (PR) (Ganchev etal., 2010) are both primarily semi-supervised mod els (self citation)
W12-1905 2012 18
Types of constraints include ILP-based methods (Chang et al, 2007;Chang et al, 2008; Ravi and Knight, 2009), and pos terior regularization (Grac?a et al, 2007; Ganchev et al., 2010)
N13-1113 2013 44
Our novel method addresses this problem.Declarative knowledge and constraints Previous work has shown that incorporating declara tive constraints into feature-based machine learning 929 models works well in many NLP tasks (Chang et al., 2007; Mann and McCallum, 2008; Druck et al 2008; Bellare et al2009; Ganchev et al2010)
N13-1113 2013 27
Recent work has shown that explicit declarationof domain and expert knowledge can be highly use ful for structured NLP tasks such as parsing, POS tagging and information extraction (Chang et al 2007; Mann and McCallum, 2008; Ganchev et al 2010)
N13-1131 2013 23
Excluding self- and co-training methods, these methods can be categorized into two broad classes: those whichbootstrap from a small number of tokens (some times called prototypes) (Collins and Singer, 1999; Haghighi and Klein, 2006), and those which impose constraints on the underlying unsupervised learning problem (Chang et al2007; Bellare et al2009; Druck et al2008; Ganchev et al2010)
P13-1117 2013 87
Most constraints that prove useful for SRL (Chang et al., 2007) also require customization when appliedto a new language, and some rely on languagespecific resources, such as a valency lexicon