Abstract

The goal of computer vision is to make machines see, or at least to perform vision tasks with the same quality, quantity and speed as humans and animals.Humans and animals are able to delineate, detect and recognize objects in complex scenes 'in no time'. One of the most valuable and critical resources in human visual processing is time, therefore a highly parallel model is the biological answer dealing satisfactorily with this resource. Hierarchical representation and hierarchical processing in computer vision systems are the credible approach to address space and performance constraints, observed in human and animal visual systems. A widely used hierarchical representation in many areas of computer vision and pattern recognition is the (regular) image pyramid, which employs both coarse to fine and fine to coarse processing strategies. The main advantage of regular pyramids is rapid computation of global information in a recursive manner, due to logarithmic height with respect to the size of the input, making algorithms running in this data structure having a logarithmic time complexity. However, regular image pyramids lack shift invariance and do not preserve object connectivity as a result of the fixed vertical neighborhood. Thus they should be abandoned as general segmentation methods. In order to cope with shift invariance, among others, new hierarchical structures, the so called irregular pyramids, should be used. However, the logarithmic height of irregular pyramids is lost, as well as the computational efficiency. The non-logarithmic height is the main drawback of the irregular structures. Employing graph theoretical formalisms to describe irregular hierarchical structures allows an easy analysis of irregular graph pyramids. In order to be able to encode multiple boundaries between regions we use dual graphs, and the build pyramid is called dual graph pyramid.In this thesis we mainly deal with dual graph pyramids and their application in image partitioning. We introduce two new graph concepts and show that the presented methods, maximal independent edge set (MIES) and maximal independent directed edge set (MIDES) used for the construction of stochastic irregular pyramids, bound logarithmically the height of the pyramid. We show that the two widely used stochastic decimation strategies, the maximal independent vertex set (MIS) and data driven decimation process (D3P) do not lead to logarithmic tapering graph pyramids. After studying different techniques to build the structure of the pyramid, we are motivated to use these structures in a framework for bottom-up processing. Thus into this irregular graph pyramid framework, we introduce a time efficient image partitioning method based on Boruvka's minimum spanning tree principle (MST)(BoruSeg). Although the goal of image segmentation is a single partitioning of the image, and not necessarily an image hierarchy, a hierarchical representation is needed, especially if the image context is not taken into consideration.
Even thought this method makes greedy decisions during the merging process it is able to capture important perceptually groupings. We evaluate the quality of the segmentation results of the MIES, MIS and D3P versions of this MST based method with respect to humans and to other graph-based methods. The evaluation shows that in image segmentation, the different stochastic decimations used in this MST image partitioning method produce comparable results. We summarize the major contribution of this thesis and discuss about the computationally hard problem of graph matching and the applicability of hierarchical approaches in dealing with its complexity.

Reference

Haxhimusa, Y. (2006). Structurally optimal dual graph pyramid and its application in image partitioning [Dissertation, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/186888