7693 results found.
Arabic Treebank (ATB)
Written
Corpus,
ACLHT2011
Expand/Collapse
Language Type:
Multilingual
Languages:
Standard Arabic
Availability:
From Data Center(s)
License:
BSD
Size:
<Not Specified> Production Status:
Existing-used
Use:
Parsing and Tagging
Paper:
N/A
Documentation:
English
Amharic-English bilingual corpus
Written
Corpus,
LREC2018
Expand/Collapse
Language Type:
Multilingual
Languages:
Amharic English
Availability:
From Data Center(s)
License:
ELRA
Size:
232653 words Production Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
<Not Specified>
Japanese Twitter hashtag dataset
Written
Corpus,
COLING2016
Expand/Collapse
Language Type:
Multilingual
Languages:
Japanese
Availability:
<Not Specified>
License:
Licensed by Twitter. See https://dev.twitter.com/overview/terms/agreement-and-policy for more details.
Size:
321 MByte Production Status:
Newly created-finished
Use:
Word Segmentation
Paper:
N/A
Documentation:
The official documentation from Twitter is publicly available
MADA+TOKAN
Written
Language Modeling Tool,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
Standard Arabic
Availability:
Freely Available
License:
not sure, free for academic research use
Size:
10 MByte Production Status:
Existing-used
Use:
Language Modelling
Paper:
N/A
Documentation:
<Not Specified>
National Corpus of Polish
Written
Corpus,
LTC2011
Expand/Collapse
Language Type:
Multilingual
Languages:
Polish
Availability:
Freely Available
License:
<Not Specified>
Size:
1000000 tokensProduction Status:
Newly created-finished
Use:
Word Sense Disambiguation
Paper:
N/A
Documentation:
<Not Specified>
Merrit Ruhlen's 2015 segment database
Speech
Typological Database,
LREC2016
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
PNAS Terms (http://www.pnas.org/site/misc/terms.xhtml)
Size:
3.1 MByte Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
Paper:
N/A
Documentation:
In the resource itself
TiGer
Written
Corpus,
EMNLP2010
Expand/Collapse
Language Type:
Multilingual
Languages:
German
Availability:
Freely Available
License:
Free for scientific purposes
Size:
50,000 sentences Production Status:
Existing-used
Use:
Lexicon extraction
Paper:
N/A
Documentation:
German/English, included in download
Web Dataset for Text-based Image Annotation Development
Multimodal/Multimedia
Corpus,
COLING2010
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
<Not Specified>
Size:
300 image/text pairs Production Status:
Newly created-finished
Use:
Text Mining
Paper:
N/A
Documentation:
<Not Specified>
Wikipedia Talk Pages Comments
Written
Corpus,
COLING2012
Expand/Collapse
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
Gnu
Size:
1 Production Status:
Newly created-in progress
Use:
Dialogue
Paper:
N/A
Documentation:
<Not Specified>
20 newsgroups
Written
Evaluation Data,
ACL2016
Expand/Collapse
Previous
|
Next
Language Type:
Multilingual
Languages:
English
Availability:
Freely Available
License:
OpenSource
Size:
13.8 MByte Production Status:
Existing-used
Use:
Document Classification, Text categorisation
Paper:
N/A
Documentation:
<Not Specified>