7693 results found.
twitter-korean-text API
Written
Tagger/Parser,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
Korean
Availability:
Freely Available
License:
Apache 2.0
Size:
<Not Specified> <Not Specified>Production Status:
Existing-used
Use:
Parsing and Tagging
Paper:
N/A
Documentation:
Yes, English (eng)
Boston University Radio News Corpus
Speech
Corpus,
NAACL2013
Expand/Collapse
Language Type:
Multilingual
Languages:
American English
Availability:
From Data Center(s)
License:
LDC
Size:
<Not Specified> Production Status:
Existing-used
Use:
Speech Recognition/Understanding
Paper:
N/A
Documentation:
<Not Specified>
Annotated Tweets
Written
Evaluation Data,
NAACL2013
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
To be determined by the company
License:
<Not Specified>
Size:
1MB Production Status:
Newly created-finished
Use:
Named Entity Recognition
Paper:
N/A
Documentation:
No documentation currently. To be determined if the company will release the dataset for research purposes.
The Prague Dependency Treebank 2.0
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Czech
Availability:
From Data Center(s)
License:
LDC
Size:
50 thousand sentences Production Status:
Existing-used
Use:
Discourse
Paper:
N/A
Documentation:
http://ufal.mff.cuni.cz/pdt2.0/
MMSEG
Written
Tokenizer,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Mandarin Chinese
Availability:
Freely Available
License:
MIT license
Size:
<Not Specified> Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
Yes, in English, publicly available at http://www.nltk.org/
Argumentative microtext corpus
Written
Corpus,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
English German
Availability:
Freely Available
License:
CC
Size:
600 sentences Production Status:
Existing-updated
Use:
Discourse
Paper:
N/A
Documentation:
English, available
The RST Chinese Treebank
Written
Treebank,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
Chinese
Availability:
Not Applicable
License:
N/A
Size:
N/A words Production Status:
Newly created-in progress
Use:
Discourse
Paper:
N/A
Documentation:
Xue, N., da Cunha, I., Iruskieta, M., and Wang, C. (to appear). Discourse segmentation for building a rst chinese treebank. In Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms.
EVOCA
Written
Corpus,
LREC2014
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
791 Production Status:
Newly created-finished
Use:
Emotion Recognition/Generation
Paper:
N/A
Documentation:
<Not Specified>
multilingual parallel corpus of translations of the Bible
Written
Corpus,
COLING2016
Expand/Collapse
Language Type:
Trilingual
Languages:
English Indonesian Russian
Availability:
Freely Available
License:
Creative Commons
Size:
600,000 words / language words Production Status:
Existing-used
Use:
Morphological Analysis
Paper:
N/A
Documentation:
On the web site
MaltParser
Written
Tagger/Parser,
COLING2010
Expand/Collapse
Previous
|
Next
Language Type:
Trilingual
Languages:
English Mandarin Chinese Swedish
Availability:
Freely Available
License:
unknown
Size:
<Not Specified> Production Status:
Existing-updated
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
<Not Specified>