7693 results found.
Ground truth word-to-word translations
Written
Evaluation Data,
COLING2012
Expand/Collapse
Language Type:
Trilingual
Languages:
Dutch English italian
Availability:
<Not Specified>
License:
OpenSource
Size:
<Not Specified> Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
<Not Specified>
Not Applicable
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
<Not Specified>
MeCab
Written
Tokenizer,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Japanese
Availability:
Freely Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
Summarisation
Paper:
N/A
Documentation:
<Not Specified>
SogouT
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Mandarin Chinese
Availability:
Freely Available
License:
Sogou
Size:
1TB Production Status:
Existing-used
Use:
Unlabeled corpus of Chinese web pages
Paper:
N/A
Documentation:
http://www.sogou.com/labs/dl/t.html
Arabic Treebank (ATB)
Written
Corpus,
EMNLP2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Standard Arabic
Availability:
From Data Center(s)
License:
LDC
Size:
<Not Specified> Production Status:
Existing-used
Use:
Morphological Analysis, Word Sense Disambiguation
Paper:
N/A
Documentation:
<Not Specified>
Faroese OCR post-processing correction toolkit
Written
Corpus Tool,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
Faroese
Availability:
From Owner
License:
simple copyright
Size:
50 MByte Production Status:
Newly created-in progress
Use:
Corpus Creation/Annotation
Paper:
N/A
Documentation:
In process
AURORA Project Database 2.0 - Evaluation Package
Speech
Corpus,
IS2011
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
GByte
Size:
2.3 Production Status:
Existing-used
Use:
Speech Recognition/Understanding
Paper:
N/A
Documentation:
<Not Specified>
Graded sense and usage annotation
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
928 Production Status:
Existing-used
Use:
Word Sense Disambiguation
Paper:
N/A
Documentation:
Available publicly in English
KPG English Learner Corpus
Written
Corpus,
RANLP2011
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
<Not Specified>
Size:
3.5 million words Production Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
<Not Specified>
Uyghur Encyclopedia (UE)
Written
Corpus,
COLING2010
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
Uyghur
Availability:
From Owner
License:
<Not Specified>
Size:
59,992 single-labeled documents Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
<Not Specified>