7693 results found.
The Russian RST Treebank
Written
Treebank,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
Russian
Availability:
Not Applicable
License:
N/A
Size:
Not available words Production Status:
Newly created-in progress
Use:
Discourse
Paper:
N/A
Documentation:
Pisarevskaya, D., Ananyeva, M., Kobozeva, M., Nasedkin, A., Nikiforova, S., Pavlova, I., and Shelepov, A. (to appear). Rhetorical relations markers in russian rst treebank. In Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms.
Ukwabelana corpus
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
South African Zulu
Availability:
Freely Available
License:
OpenSource
Size:
10,000 morphologically labelled/100,000 raw Zulu words, 2800 POS-tagged/30,000 untagged sentences Production Status:
Newly created-in progress
Use:
training of machine learning algorithms, text-to-speech synthesis, spell-checking, predictive text, grammar checking, machine translation, language research
Paper:
N/A
Documentation:
Documentation of Zulu morphology, labeling scheme, algorithms used, in English, publicly available
XML model of Wikipedia
Written
Corpus,
RANLP2011
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Owner
License:
<Not Specified>
Size:
N/A Production Status:
Existing-updated
Use:
Text Mining
Paper:
N/A
Documentation:
<Not Specified>
HIWIRE (Human Input that Works In Real Environments) database
Speech
Corpus,
IS2011
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
GByte
Size:
3.03 Production Status:
Existing-used
Use:
Speech Recognition/Understanding
Paper:
N/A
Documentation:
<Not Specified>
Wikipedia
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English french
Availability:
Will be made available online
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Newly created-in progress
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
No documentation
TDT corpus
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Owner
License:
LDC
Size:
<Not Specified> Production Status:
Existing-used
Use:
Summarisation
Paper:
N/A
Documentation:
<Not Specified>
Blizzard Challenge 2010
Speech
Corpus,
IS2013
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
For Blizzard Challenge 2010 participants
License:
<Not Specified>
Size:
5 hours Production Status:
Existing-used
Use:
Speech Synthesis
Paper:
N/A
Documentation:
<Not Specified>
Brown Corpus
Written
Corpus,
NAACL2013
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
<Not Specified>
Size:
26535 Production Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
<Not Specified>
TREC Category B
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
<Not Specified>
Size:
50M pages Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
<Not Specified>
Reuters
Written
Corpus,
COLING2010
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
NIST
Size:
80M words Production Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
<Not Specified>