Conference: LREC 2020

Year: 2020
Description: The Human Language Technology (HLT) Group at the University of Belgrade has been developing the system of morphological electronic dictionaries of Serbian: SMD over a long period, and they have reached a considerable volume to date. SMD follows the methodology and format (known as DELAS/DELAF) that was developed in LADL (Laboratoire d'Automatique Documentaire et Linguistique) under the guidance of Maurice Gross. The format of a DELAS-type dictionary basically consist of simple word lemmas accompanied with inflectional class codes which enable production of a DELAF-type dictionary which consists of all inflectional forms with their grammatical information. One finite-state transducer responsible for generation of all inflectional forms of each DELAS lemma corresponds to each inflectional class code. The Serbian morphological dictionary of simple words contains 200,000 lemmas which yield the production of approximately 2,700,000 different lexical words. Close to 100,000 simple lemmas belong to general lexica, while the remaining 100,000 lemmas represent various kinds of simple proper names and terminology. Morphological description of compounds, compatible with the methodology used for simple words, relies on the usage of Finite-State Technology. The final aim is to produce the counterpart of DELAS/DELAF dictionaries of simple words for compounds -DELAC/DELACF. At present, the dictionary of compounds has about 18,000 lemmas covering different parts of speech.
URL: