Abstract

This paper presents an object-based method for analysing the content drawn by graphical operators in natively digital PDF documents. We propose that graphical content in a document can be classified either as structural or non-structural and present an output model for our analysis result. Heuristic techniques are used to group the instructions into regions and determine their logical role in the document's structure. Experimental results demonstrate the effectiveness of the algorithm.

Reference

Gabdulkhakova, A., & Hassan, T. (2012). Document Understanding of Graphical Content in Natively Digital PDF Documents. In DocEng’12 Proceedings of the 2012 ACM symposium on Document engineering (pp. 137–140). http://hdl.handle.net/20.500.12708/54272