Abstract
In the scope of this work, four different problems have been studied in the context of German political statements and the German language in general, namely topic classification, opinion type classification, alliteration detection, and hyperbole detection.Most of the experiments were conducted using a dataset that was created based on protocols of the Austrian national council containing around 65000 political statements.The topic classification was performed by extracting topic related terms from the Wikipedia article on a certain topic. It was manually evaluated and led to results that leave room for improvement, as the precision regarding the topics feminism (36.39%) and European migrant crisis (19.04%) showed. In the case of climate change, a precision of 89.02% was achieved.The approach that was implemented for opinion type classification is based on part-of speechtagging and was proposed and implemented for the English language by Othman etal. [Using NLP Approach for Opinion Types Classifier, Othman et al., 2015]. The goal ofthe experiments was to show whether the approach is applicable to the German languageas well when using a part-of-speech tagger for the German language and the respectivetags. The evaluation showed that the performance of this approach is comparable interms of precision in the case of the opinionated statements (76.60% vs 71.00%). It was not the case for comparative (78.30% vs 44.00%) and superlative opinionated statements(82.10% vs 44.00%).In the case of alliteration detection, a precision of 99.33% was achieved on an alliteration dataset containing 605 alliterations. Three additional experiments were performed onfree text, where an average precision of 53.83% was achieved, with 30.00% being the worst case. The approach utilizes the Cologne Phonetics algorithm by Postel [Die Kölner Phonetik - Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse, Postel, 1969] and combines it with additional rules.For hyperbole detection, an existing approach for the English language by Troiano etal. [A computational exploration of exaggeration, Troiano et al., 2018] based on the computation of semantic features has been implemented for the German language. It was defined as a supervised machine learning problem; a binary classification task. The results were compared in terms of precision (76.00% vs 52.23%), recall (76.00% vs 38.52%),accuracy (72.00% vs 68.90%) and F1-score (76.00% vs 41.11%). The performance was only comparable in terms of accuracy.
Reference
Deringer, C. (2024). Political opinion analysis and figure of speech detection : Topic and opinion type classification in the political context; alliteration and hyperbole detedction [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.102700