7693 results found.
FIRE 2010 data
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English Hindi
Availability:
From Owner
License:
FIRE 2010
Size:
342MBytes Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
not available
<Not Specified>
Written, annotated with grammatical errors
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
to be announced
Size:
20000 words approx Production Status:
Newly created-finished
Use:
Evaluation/Validation
Paper:
N/A
Documentation:
documentation in English. will be publicly available very soon
English-Korean Parallel Corpus
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English Korean
Availability:
From Owner
License:
<Not Specified>
Size:
454,315 bi-sentences Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
<Not Specified>
Sentiment Resources like FrameNet, MPQA classifer and dictionary from opinionfinder, semantic orient
Written
Lexicon,
EMNLP2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
10000 Production Status:
Existing-used
Use:
Discourse
Paper:
N/A
Documentation:
<Not Specified>
PAN'10 plagiarism detection corpus
Written
Corpus,
RANLP2011
Expand/Collapse
Language Type:
Trilingual
Languages:
English German Spanish
Availability:
Freely Available
License:
<Not Specified>
Size:
4.76GB Production Status:
Existing-updated
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2010p.pdf
hrWac
Written
Corpus,
LREC2014
Expand/Collapse
Language Type:
Trilingual
Languages:
Bosnian Croatian Serbian
Availability:
Freely Available
License:
CC-BY-SA 3.0
Size:
1909886930 Production Status:
Newly created-in progress
Use:
Paper:
N/A
Documentation:
<Not Specified>
DMoZ corpus
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Owner
License:
OpenSource
Size:
68Gbyte Production Status:
Existing-used
Use:
Acquisition
Paper:
N/A
Documentation:
<Not Specified>
idiomatic sentences test dataset
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
200 sentences manually annotated (by 4 annotators) as idiomatic or literal Production Status:
Newly created-in progress
Use:
small test dataset for idiomatic/literal sentence classification
Paper:
N/A
Documentation:
English, just brief description
Chinese Treebank 6.0
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Mandarin Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
23,193K Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
<Not Specified>
test set for QA candidate ranking
Written
Evaluation Data,
COLING2010
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
OpenSource
Size:
278Kbyte Production Status:
Newly created-in progress
Use:
Multiword Expression
Paper:
N/A
Documentation:
<Not Specified>