Abstract

We present a model for video segmentation, applicable toRGB (and if available RGB-D) information that constructsmultiple plausible partitions corresponding to the static andthe moving objects in the scene: i) we generate multiplefigure-ground segmentations, in each frame, parametrically,based on boundary and optical flow cues, then track, linkand refine the salient segment chains corresponding to thedifferent objects, over time, using long-range temporal constraints;ii) a video partition is obtained by composing segmentchains into consistent tilings, where the different individualobject chains explain the video and do not overlap.Saliency metrics based on figural and motion cues, aswell as measures learned from human eye movements areexploited, with substantial gain, at the level of segment generationand chain construction, in order to produce compactsets of hypotheses which correctly reflect the qualitiesof the different configurations. The model makes it possibleto compute multiple hypotheses over both individual objectsegmentations tracked over time, and for complete videopartitions. We report quantitative, state of the art results inthe SegTrack single object benchmark, and promising qualitativeand quantitative results in clips filming multiple staticand moving objects collected from Hollywood movies andfrom the MIT dataset.

Reference

Ion, A., Banica, D., Agape, A., & Sminchisescu, C. (2013). Video Object Segmentation by Salient Segment Chain Composition. International Journal of Computer Vision, 1, 1–8. http://hdl.handle.net/20.500.12708/155556