5493 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
English Hindi
Availability:
Freely Available
License:
CreativeCommons
Size:
1.49 million parallel segments <Not Specified>Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
Paper:
N/A
Documentation:
http://www.cfilt.iitb.ac.in/iitb_parallel/Language Type:
Multilingual
Languages:
American English
Availability:
<Not Specified>
License:
<Not Specified>
Size:
45000 Production Status:
Existing-used
Use:
Parsing and Tagging
Paper:
N/A
Documentation:
<Not Specified>Language Type:
Multilingual
Languages:
Mandarin Chinese
Availability:
From Owner
License:
<Not Specified>
Size:
80 newswire article Production Status:
Newly created-in progress
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
<Not Specified>
Written
Lexicon,
Language Type:
Multilingual
Languages:
Tigrinya
Availability:
Freely Available
License:
<Not Specified>
Size:
24.3 MByte Production Status:
Newly created-in progress
Use:
for developing Tigrinya Stemmer
Paper:
N/A
Documentation:
<Not Specified>
Written
Machine-Learning Model,
Language Type:
Multilingual
Languages:
German
Availability:
Freely Available
License:
CC BY-NC 3.0
Size:
374 KByte Production Status:
Newly created-finished
Use:
Sentence boundary detection
Paper:
N/A
Documentation:
See NLTK documentation.Language Type:
Multilingual
Languages:
Japanese
Availability:
<Not Specified>
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
<Not Specified>
Paper:
N/A
Documentation:
<Not Specified>
Speech
Corpus,
Language Type:
Multilingual
Languages:
Japanese
Availability:
From Data Center(s)
License:
<Not Specified>
Size:
96 tokens Production Status:
Existing-used
Use:
Emotion Recognition/Generation
Paper:
N/A
Documentation:
<Not Specified>
Written
Corpus,
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
50,000 sentences Production Status:
Existing-used
Use:
Sentence boundary detection
Paper:
N/A
Documentation:
<Not Specified>
Written
Tagger/Parser,
Language Type:
Multilingual
Languages:
Polish
Availability:
Freely Available
License:
<Not Specified>
Size:
1 MByteProduction Status:
Existing-used
Use:
Syntactic Analysis
Paper:
N/A
Documentation:
http://apps.man.poznan.pl/trac/asa-pl/
Written
Corpus,
Language Type:
Multilingual
Languages:
Russian
Availability:
Freely Available
License:
OpenSource
Size:
Russian Dependency Syntax Multi-Treebank corpus is developed under RU-EVAL-2012 initiative on evaluation of Russian dependency tree parsers. The test corpus provides a standard for qualitative comparisons between various dependency parsing schemes used in Russian NLP tools. The corpus includes a sample 64800 sentences drawn by random from fiction, news, non-fiction, blogs etc. (three or more subsequent sentences per source text). The collection is parallelly annotated with a range of parse-trees Production Status:
Newly created-finished
Use:
Evaluation
Paper:
N/A
Documentation:
http://testsynt.soiza.com/




