Abstract

This experimental study implements a solution for extracting opinions from written text with the help of supervised machine learning methods to visualize their consistencyover time. We examine the practical feasibility and the usefulness of the implemented approach.We gathered speech transcripts of the Austrian Parliament to create two datasets on topics concerning measures against the spread of the Coronavirus. We split the raw text around sentence boundaries into dataset records and used a keyword search to select relevant sentences. Then, we manually assigned opinion labels and used two statistical machine learning algorithms and three deep learning models to predict the labels. We used Monte Carlo cross-validation to evaluate classification performance. Subsequently,we used the predictions of the best-performing algorithm to plot the general sentiment stoward the topic and the consistencies of expressed opinions over time.On the larger dataset (around 5000 records), a BERT network achieved the best accuracy (70%), followed by an LSTM network (68%), an MNB classifier (67%), a Bag-of-Wordsnetwork (62%), and a BM25 document ranking classifier (42%). On the smaller dataset (around 500 records), BERT also performed best (56%), followed by the MNB (53%), theLSTM (51%), the BM25 approach (47%), and the Bag-of-Words network (42%). The biggest challenge to practical feasibility was the manual annotation effort and choosing a topic for which enough training samples are available. Thus, the approach is best suited if the intention is to monitor a small selection of topics over a long period. We showed that the usefulness of the predicted opinion consistency values depends on the accuracy of the underlying opinion predictions. By comparing the graphs from actual opinion data to graphs of predicted data, we gathered that a model with 70% accuracy is sufficient to produce a representative impression of the overall sentiment towards a topic. On the other hand, visualizing the consistency of opinions requires a higher classification accuracy to be useful.

Reference

Zaruba, S. (2021). Using natural language processing to measure the consistency of opinions expressed by politicians [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.80341