Description

Recent advancements in Automatic Speech Recognition (ASR), Text-to-Speech (TTS) technologies and Large Language Models (LLMs) have made real-time transcription and translation of spoken language progressively more practical. This thesis will explore the design and implementation of an immersive communication framework for Augmented Reality (AR), enabling real-time cross-language communication between multiple users without requiring a shared language.

The system will integrate ASR for real-time speech recognition, LLMs for accurate translation and TTS for natural voice synthesis, allowing effective multilingual interactions. This thesis focuses on performance, user experience and the reduction of language barriers in real-time communication. The system will be implemented on real AR glasses.

Tasks

  • Design and develop a real-time language translator
  • Technologies to integrate: ASR, TTS, LLMs
  • Run a user study to evaluate the system

Requirements

  • Knowledge of English language (source code, comments, and final report should be in English)
  • Programming skills (especially JS)
  • Knowledge of AR is advantageous

Environment

The project will be developed for the novel Snapchat Spectacles in Lens Studio.

Contact

For more information please contact Matteo Bosco – matteo.bosco@tuwien.ac.at