7693 results found.
RoTC
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Monolingual
Languages:
Aromanian; Arumanian; Macedo-Romanian
Availability:
From Owner
License:
<Not Specified>
Size:
341320 <Not Specified>Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
None
Russian-Ukrainian parallel corpus
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Bilingual
Languages:
Old Russian Ukrainian
Availability:
Freely Avalable
License:
<Not Specified>
Size:
2,5 million wordsProduction Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
None
Russian-Belorussian corpus
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Bilingual
Languages:
Belarusian Old Russian
Availability:
Freely Avalable
License:
<Not Specified>
Size:
1 million tokensProduction Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
None
ANC (American National Corpus) MASC (Manually Annotated Sub-Corpus)
Speech/Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Monolingual
Languages:
American English
Availability:
Freely Avalable
License:
none
Size:
500 wordsProduction Status:
Existing-updated
Use:
Most of the above
Paper:
N/A
Documentation:
None
Reuters RCV1
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English Brazilian Portuguese Danish Finland-Swedish Sign Language Germany Italian Spanish french
Availability:
From Owner
License:
<Not Specified>
Size:
13 <Not Specified>Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
None
JRC (Joint Research Centre)-Acquis
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English Brazilian Portuguese Danish Finland-Swedish Sign Language Germany Italian Spanish french
Availability:
Freely Avalable
License:
<Not Specified>
Size:
464 <Not Specified>Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
None
Europarl
Speech/Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English Brazilian Portuguese Danish Finland-Swedish Sign Language Germany Italian Spanish french
Availability:
Freely Avalable
License:
<Not Specified>
Size:
25600000 <Not Specified>Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
None
BAF (Bilingual corpus)
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Bilingual
Languages:
English Cajun French
Availability:
Freely Avalable
License:
<Not Specified>
Size:
<Not Specified> <Not Specified>Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
None
Porn Train Set
Written
Corpus,
LREC2012
Expand/Collapse
Language Type:
Monolingual
Languages:
English
Availability:
Freely Avalable
License:
<Not Specified>
Size:
106,000 filenames/titles OtherProduction Status:
Newly created-in progress
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
None
Simple English Wikipedia
Written
Corpus,
LREC2012
Expand/Collapse
Previous
|
Next
Language Type:
Monolingual
Languages:
English
Availability:
Freely Avalable
License:
<Not Specified>
Size:
4389599 <Not Specified>Production Status:
Existing-used
Use:
Text Complexity Analysis
Paper:
N/A
Documentation:
None