A stylistic device frequently employed by filmmakersis the synchronous montage (composition) of audio and visual elements.Synchronous montage helps to increase tension and tempoin a scene and highlights important events in the story. Sequenceswith synchronous montage usually contain rich semantics whichis relevant for understanding a movie. This property is currentlynot exploited in automated indexing, annotation, and summarizationof movies. We propose a cross-modal approach thatextracts sequences from a movie with synchronous audio-visualmontage. Experiments confirm that the extracted sequences havehigh semantic relevance. Consequently, they represent a usefulbasis for different high-level movie abstraction tasks such asautomated movie annotation and movie summarization.
Zeppelzauer, M., Mitrovic, D., & Breiteneder, C. (2011). Cross-Modal Analysis of Audio-Visual Film Montage. In International Conference on Computer Communication and Netwirks (ICCCN) (p. 6). IEEE eXpress Conference Publishing. http://hdl.handle.net/20.500.12708/53697