Describing When and Where in Vision (2011) – Research Unit Virtual & Augmented Reality

Abstract

Different from the what and where pathways in the organizationof the visual system, we address representations that describedynamic visual events in a unified way.Representations are an essential tool for any kind of process that operateson data, as they provide a language to describe, store and retrievethat data. They define the possible properties and aspects thatare stored, and govern the levels of abstraction at which the respectiveproperties are described. In the case of visual computing (computer vision,image processing), a representation is used to describe informationobtained from visual input (e.g. an image or image sequence and theobjects it may contain) as well as related prior knowledge (experience).The ultimate goal, to make applications of visual computing be partof our daily life, requires that vision systems operate reliably, nearlyanytime and anywhere. Therefore, the research community aims to solveincreasingly more complex scenarios. Vision both in humans and computersis a dynamic process, thus variations (change) always appear inthe spatial and the temporal dimensions. Nowadays significant researchefforts are undertaken to represent variable shape and appearance, however,joint representation and processing of spatial and temporal domainsis not a well-investigated topic yet. Visual computing tasks aremostly solved by a two-stage approach of frame-based processing andsubsequent temporal processing. Unfortunately, this approach reachesits limits in scenes with high complexity or difficult tasks e.g. actionrecognition. Therefore, we focus our research on representations whichjointly describe information in space and time and allow to process dataof space-time volumes (several consecutive frames).In this keynote we relate our own experience and motivations, to thecurrent state of the art of representations of shape, of appearance, ofstructure, and of motion. Challenges for such representations are in applicationslike multiple object tracking, tracking non-rigid objects andhuman action recognition.

Reference

Kropatsch, W., Ion, A., & Artner, N. (2011). Describing When and Where in Vision. In Lecture Notes in Computer Science (pp. 25–26). Springer. https://doi.org/10.1007/978-3-642-25085-9_2