7693 results found.
Geo-tagged Twitter Corpus
Written
Corpus,
EMNLP2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English Spanish
Availability:
Freely Available
License:
<Not Specified>
Size:
26MB Production Status:
Newly created-finished
Use:
regional linguistic variation
Paper:
N/A
Documentation:
<Not Specified>
LDC Chinese-English parallel corpus
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English Mandarin Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
1.5M sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
<Not Specified>
Hindi Dependency Treebank
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Hindi
Availability:
From Owner
License:
<Not Specified>
Size:
16Mbyte Production Status:
Newly created-in progress
Use:
Parsing and Tagging
Paper:
N/A
Documentation:
Some documentation is available, but much more is planned. All of it will be in English and will be publicly available.
Wiki50
Written
Corpus,
RANLP2011
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Owner
License:
Creative Commons
Size:
1MB Production Status:
Newly created-finished
Use:
Named Entity Recognition
Paper:
N/A
Documentation:
annotation guidelines and description of the dataset
English TTS speech corpus of air traffic (pilot) messages - Serbian accent
Speech/Written
Corpus,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Size:
3000 sentences Production Status:
Newly created-finished
Use:
Speech Synthesis
Paper:
N/A
Documentation:
<Not Specified>
Tork Bootstrap Word Sense Inventory
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
24600 Production Status:
Existing-used
Use:
Lexicon Creation / Annotation
Paper:
N/A
Documentation:
English
Training data for Aspect-Based Sentiment Analysis in French
Written
Corpus,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
french
Availability:
Freely Available
License:
Creative Commons
Size:
1669 sentences Production Status:
Newly created-finished
Use:
Opinion Mining/Sentiment Analysis
Paper:
N/A
Documentation:
Yes. The annotation guidelines are in English.
non-native speaking data data
Speech
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Not Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Newly created-finished
Use:
Automatic speech assessment
Paper:
N/A
Documentation:
<Not Specified>
EPO corpus
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Owner
License:
<Not Specified>
Size:
23 Production Status:
Not Applicable
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
<Not Specified>
<Not Specified>
Written
Corpus,
COLING2012
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
<Not Specified>