Conference: COLING 2020

Year: 2020
Description: The Penn Discourse Treebank (PDTB) is an NSF funded project at the University of Pennsylvania. The goal of the project is to annotate the 1 million word Wall Street Journal corpus in Treebank-2 (LDC95T7) with discourse relations holding between the eventualities and propositions mentioned in text, which serve as the arguments to the relation. Discourse relations are assumed to have exactly two arguments. PDTB, version 2.0. is a continuation of PDTB, version 1.0. (made available freely in 2006 but no longer available). Following a lexically grounded approach to annotation, the PDTB annotates relations realized explicitly by Explicit connectives drawn from syntactically well-defined classes, as well as relations between adjacent sentences when no Explicit connective appears to relate the two. Arguments of relations are annotated in each case. For Explicit connectives, arguments are unconstrained in terms of their distance from the connective and can be found anywhere in the text. Between adjacent sentences where no Explicit connective appears, four scenarios hold: (a) the sentences may be related by a discourse relation that has no realization in the second sentence, in which case a connective (called an Implicit connective) is provided to express the inferred relation (b) the sentences may be related by a discourse relation that is realized by some alternative non-connective expression, in which case these alternative lexicalizations are annotated as the carriers of the relation (labelled as AltLex) (c) the sentences may be related not by a discourse relation, but merely by an entity-based coherence relation, in which case the presence of such a relation is labelled (as EntRel) and (d) the sentences may not be related at all, in which case they are labelled as such (NoRel).
URL: https://catalog.ldc.upenn.edu/LDC2008T05