| Description: | 1. Publication title: Macro Chinese Discourse Treebank (MCDTB) 
2. Data type: XML
3. Genres: All Documents are Newswire.  
4. Applications: Natural Language Processing, Discourse Analysis, Information Extraction, Automatic Summary. 
5. Language: Chinese
6. Description of the corpus structure and data attributes: There are 720 text files in this release, containing 3,981 paragraphs, 8,319 sentences, 398,829 words. The data is provided in the UTF-8 encoding, and the annotation following the theory of RST (Rhetorical Structure Theory), labeled the macro discourse information, including discourse structure, nuclearity, relations, topic sentences, lead and abstract. |