7693 results found.
Chinese Wikipedia Dump corpus
Written
Corpus,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
Chinese
Availability:
Freely Available
License:
OpenSource
Size:
54M words Production Status:
Existing-used
Use:
Machine Learning
Paper:
N/A
Documentation:
<Not Specified>
NTUSD
Written
Lexicon,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Mandarin Chinese
Availability:
Freely Available
License:
NTU
Size:
<Not Specified> Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
<Not Specified>
TIDES Extraction (ACE) 2003 Multilingual Training Data
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Trilingual
Languages:
English Mandarin Chinese Standard Arabic
Availability:
From Data Center(s)
License:
LDC
Size:
<Not Specified> Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
<Not Specified>
IWSLT 2011 Arabic-English data
Speech/Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English Standard Arabic
Availability:
Freely Available
License:
<Not Specified>
Size:
8000000 sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
<Not Specified>
Tohoku RTE
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
Japanese
Availability:
Freely Available
License:
OpenSource
Size:
56 Production Status:
Newly created-in progress
Use:
Textual Entailment and Paraphrasing
Paper:
N/A
Documentation:
currently not available
R52
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
<Not Specified>
WSJ Penn Treebank
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
<Not Specified> Production Status:
Existing-used
Use:
Parsing and Tagging
Paper:
N/A
Documentation:
<Not Specified>
Afribooms treebank
Written
Treebank,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
Afrikaans
Availability:
From Data Center(s)
License:
CreativeCommons
Size:
44715 words Production Status:
Newly created-finished
Use:
Parsing and Tagging
Paper:
N/A
Documentation:
<Not Specified>
pc-compounder input (file with phrasal compounds)
Written
Corpus,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
7200 sentences Production Status:
Newly created-in progress
Use:
Morphological Analysis
Paper:
N/A
Documentation:
Trips (2012), Trips (2014), Trips and Kornfilt (2015)
NAIST Text Corpus
Written
Corpus,
ACLHT2011
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
Japanese
Availability:
need Mainichi Shinbun '95 corpus
License:
OpenSource
Size:
2929 articles Production Status:
Existing-used
Use:
Discourse
Paper:
N/A
Documentation:
English