7693 results found.
Mainichi Newspaper
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
Japanese
Availability:
<Not Specified>
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Existing-used
Use:
<Not Specified>
Paper:
N/A
Documentation:
<Not Specified>
Keio Emotional Speech Database (Keio-ESD)
Speech
Corpus,
IS2013
Expand/Collapse
Language Type:
Multilingual
Languages:
Japanese
Availability:
From Data Center(s)
License:
<Not Specified>
Size:
96 tokens Production Status:
Existing-used
Use:
Emotion Recognition/Generation
Paper:
N/A
Documentation:
<Not Specified>
Brown Corpus
Written
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
From Data Center(s)
License:
LDC
Size:
50,000 sentences Production Status:
Existing-used
Use:
Sentence boundary detection
Paper:
N/A
Documentation:
<Not Specified>
Automatic Syntactic Analysis for Polish Language (ASA-PL)
Written
Tagger/Parser,
LTC2011
Expand/Collapse
Language Type:
Multilingual
Languages:
Polish
Availability:
Freely Available
License:
<Not Specified>
Size:
1 MByteProduction Status:
Existing-used
Use:
Syntactic Analysis
Paper:
N/A
Documentation:
http://apps.man.poznan.pl/trac/asa-pl/
Russian Dependency Syntax Multi-Treebank
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
Russian
Availability:
Freely Available
License:
OpenSource
Size:
Russian Dependency Syntax Multi-Treebank corpus is developed under RU-EVAL-2012 initiative on evaluation of Russian dependency tree parsers. The test corpus provides a standard for qualitative comparisons between various dependency parsing schemes used in Russian NLP tools. The corpus includes a sample 64800 sentences drawn by random from fiction, news, non-fiction, blogs etc. (three or more subsequent sentences per source text). The collection is parallelly annotated with a range of parse-trees Production Status:
Newly created-finished
Use:
Evaluation
Paper:
N/A
Documentation:
http://testsynt.soiza.com/
KALAKA-2
Speech
Evaluation Data,
IS2011
Expand/Collapse
Language Type:
Trilingual
Languages:
Basque Catalan English
Availability:
From Owner
License:
hours
Size:
125 Production Status:
Newly created-finished
Use:
Language Identification
Paper:
N/A
Documentation:
<Not Specified>
Europarl
Speech/Written
Corpus,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
Dutch English
Availability:
Freely Available
License:
<Not Specified>
Size:
2 million sentences Production Status:
Existing-used
Use:
Corpus Creation/Annotation
Paper:
N/A
Documentation:
<Not Specified>
Brown Coherence Toolkit
Written
Tool: discourse coherence model,
ACLHT2011
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
<Not Specified> Production Status:
Newly created-finished
Use:
Discourse
Paper:
N/A
Documentation:
<Not Specified>
Wikipedia discourse connectives
Written
Corpus,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
CC BY-SA 3.0 US
Size:
351 MByte Production Status:
Newly created-finished
Use:
Discourse
Paper:
N/A
Documentation:
English documentation available in the README file that comes with the dataset.
NLTK package
Written
Tokenizer,
COLING2010
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
Apache 2.0
Size:
<Not Specified> Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
Yes, in English, publicly available at http://www.nltk.org/