Conference: LREC 2020

Year:	2020
Description:	This resource contains a context-independent gold standard for English-Dutch and French-Dutch cognate detection. To this end, automatic word alignment was applied on the Dutch Parallel Corpus, and all term equivalents with a Normalized Levenshtein distance smaller than 0.5 were extracted. This resulted in a list with 28,503 English-Dutch candidate cognate pairs, and 22,715 French-Dutch candidate cognate pairs, which were subsequently manually labeled according to the guidelines established in Labat et al. 2019. The following labels were annotated: (1) Cognate: words which have a similar form and meaning in all contexts, (2) Partial cognate: words which have a similar form, but only share the same meaning in some contexts, (3) False friend: words which have a similar form but a different meaning, (4) Proper name: proper nouns (e.g. persons, companies, cities, coun-tries, etc.) and their derivations, (5) Error: word alignment errors and compound nouns of which one part is a cognate but the other part is missing in one of the languages, and (6) No standard: words that do not occur in the dictionary of that particular language.
URL: