This paper presents a methodology for the evaluation oftable understanding algorithms for PDF documents. Theevaluation takes into account three major tasks: table detection,table structure recognition and functional analysis.We provide a general and exible output model foreach task along with corresponding evaluation metrics andmethods. We also present a methodology for collectingand ground-truthing PDF documents based on consensusreachingprinciples and provide a publicly available groundtrutheddataset.Categories and Subject Descriptors: I.7.5[Document and Text Processing]: Document Capture|document analysis; H.3.4 [Information Storage and Re-trieval]: Systems and Software|performance evaluationKeywords: Table processing, metrics, ground-truthdataset, performance evaluation, document analysis, documentunderstanding
