Description: |
TWITTIRĂ’-UD is a collection of ironic Italian tweets annotated in Universal Dependencies. The treebank can be exploited for the training of NLP systems to enhance their performance on social media texts, and in particular, for irony detection purposes.
TWITTIRĂ’-UD has been created by enriching a resource originally developed for training and testing irony detection systems, also exploited as a benchmark for the Italian irony detection task held in EVALITA 2018 (Cignarella et al., 2018). The treebank comprises both the fine-grained annotation for irony applied in Karoui et al. (2017), and the morphological and syntactic information encoded by the UD format.
The original corpus consists of 1,424 tweets (28,387 tokens). The syntactic annotation process was carried out through alternating steps of automatic scripting and manual revision, and finally with some out-of-domain parsing experiments. Parsing results also underwent a manual revision by two independent annotators.
In order to meet the requirements of the EU General Data Protection Regulation (GDPR), entered into force on May 2018, the resource content has been pseudonymized, by substituting original tweet IDs and user names.
In this late update we enriched the annotation of this Treebank with a new layer of labels, concerning token-level IRONY ACTIVATORS. |