Abstract

This paper presents a methodology for the evaluation oftable understanding algorithms for PDF documents. Theevaluation takes into account three major tasks: table detection,table structure recognition and functional analysis.We provide a general and exible output model foreach task along with corresponding evaluation metrics andmethods. We also present a methodology for collectingand ground-truthing PDF documents based on consensusreachingprinciples and provide a publicly available groundtrutheddataset.Categories and Subject Descriptors: I.7.5[Document and Text Processing]: Document Capture|document analysis; H.3.4 [Information Storage and Re-trieval]: Systems and Software|performance evaluationKeywords: Table processing, metrics, ground-truthdataset, performance evaluation, document analysis, documentunderstanding

Reference

Göbel, M., Hassan, T., Oro, E., & Orsi, G. (2012). A methodology for evaluating algorithms for table understanding in PDF documents. In Proceedings of the 2012 ACM symposium on Document engineering - DocEng ’12. DocEng 2012 ACM Symposium on Document Engineering, Pairs, France, EU. ACM New York, NY, USA ©2012. https://doi.org/10.1145/2361354.2361365