Conference: LREC 2020

Year: 2020
Description: The Prague Dependency Treebank of Spoken Czech 2.0 is a corpus of spoken language, consisting of 742,257 tokens and 73,835 sentences, representing 6,174 minutes (over 100 hours) of spontaneous dialogs. The dialogs have been recorded, transcribed and edited in several interlinked layers: audio recordings, automatic and manual transcripts and manually reconstructed text. These layers along with morphological annotation were part of the first version of the corpus (PDTSC 1.0). Version 2.0 is extended by annotation at the dependency syntax layer and the “deep” syntax layer, which contains semantic roles and relations as well as annotation of coreference.
URL: