Abstract
This master thesis is about obfuscation of persons shown in a video. The obfuscation process is based on an automatic evaluation of emotional speech. On the one hand visible and audible individuals are anonymized by the implemented effects. On the other hand the effects are supposed to reconstruct or even emphasize emotions that are lost during the anonymization process. Many works on emotion recognition focus on distinguishing between the so-called basic emotions proposed by Ekman like joy, sadness, anger, fear, etc. In this thesis, emotions are described in a continuous, three-dimensional space, the coordinate axes of which correspond to the emotion primitives valence, arousal and dominance. The emotion recognition is accomplished by two different machine learning algorithms namely Support Vector Regression and a modified k-Nearest-Neighbor algorithm. The training and test sets for the machine learning process are taken from the German "Vera am Mittag" database out of the HUMAINE project. The dataset contains twelve hours of annotated and ready-to-use video and speech. In this work 69 prosodic and spectral features such as pitch, RMS or MFCC are used for emotion recognition. A separate ranking of all features is created for each of the three emotion primitives. Three different visual anonymization effects are implemented: an edge based effect, a symbolic based effect and an effect for a hand-painted look. The emotion primitives act as steering parameters for the effects and thus directly influence their appearance. Voice is anonymized by applying a vocoder-like effect.
Reference
Fischl, C. (2011). Emotionsbasierte Videoverfremdung [Diploma Thesis, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/160492