Conference: COLING 2020

Year: 2020
Description: 1. Publication title: Macro Chinese Discourse Treebank (MCDTB) 2. Data type: XML 3. Genres: All Documents are Newswire. 4. Applications: Natural Language Processing, Discourse Analysis, Information Extraction, Automatic Summary. 5. Language: Chinese 6. Description of the corpus structure and data attributes: There are 720 text files in this release, containing 3,981 paragraphs, 8,319 sentences, 398,829 words. The data is provided in the UTF-8 encoding, and the annotation following the theory of RST (Rhetorical Structure Theory), labeled the macro discourse information, including discourse structure, nuclearity, relations, topic sentences, lead and abstract.
URL: https://figshare.com/s/250474dba44e4161b040