Abstract

Acoustic Scene Classification (ASC) aims to identify the environmental surroundings from audio recordings in which they were collected. Developing computational methods to recognize the environments is part of the research in fields such as machine learning,robotics and artificial intelligence. This work explores the usage of ASC in the privatesector by developing a prototype and performing a case study in order to evaluate if a certain psychological effect can be achieved with this technology. In particular, we examine if an acoustic illusion of being in a different city can be created when the user’s soundscape is overlaid with a different soundscape based on the classification result.In this thesis, we explore the current state-of-the-art solutions for ASC by investigating the submissions of the DCASE 2020 challenge. Based on the evaluation we choose a baseline model, which is a ResNet that accepts log-mel spectrograms complemented by log-mel deltas and delta-deltas as input. We modify the network in order to classify 9 acoustic scenes. The TAU Urban Acoustic Scenes 2020 Mobile dataset is used to train the model. To calibrate the network, we experiment with different hyperparameters, loss functions and data augmentation strategies and compare the results based on the test accuracy. We achieve the best performance with a test accuracy of 77.99% by using categorical focal loss and Mixup as data augmentation technique.For the prototype we select 4 cities to support and create a soundscape dataset including audio samples for each scene and city by extracting the audio data from videos that contain the desired content. Then, we develop a server application with Flask to provide an API to get predictions from the model and to log data about the performed classification processes. Eventually, we design and implement a mobile application using the FlutterSDK. The application requests at intervals predictions from the server for audio data,that it recorded using the device’s microphone. Based on the classification result, the mobile application overlays the acoustic environment of the user by playing an audiosample for the recognized scene and the selected city from the soundscape dataset.Finally, we use the prototype in a case study where a group of participants tests the mobile application within a predefined scope. In the evaluation, we estimate the classifier’saccuracy in a real life scenario with 65%. We discover that the acoustic illusion of being in a different city is created by the prototype. Additionally, we show that this illusion is experienced more often the more accurate the classification is perceived and the closer the personal relationship between the user and the selected city is.

Reference

Rybnikova, T. V. (2022). Design and evaluation of a sound application using acoustic scene classification [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.87561