## TableFormer: Table Structure Understanding with Transformers. ## Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar { ## Abstract Tables organize valuable content in a concise and com- ## 1. Introduction The occurrence of tables ## a. Picture of a table: - b. Red-annotation of bounding boxes, - c. Structure predicted by TableFormer: Figure 1: Recently, The first problem is called table-location and has been considered as a solved problem, given enough ground-truth The second problem is called table-structure decompo- In this paper, we want to address these weaknesses and To meet the design criteria listed above, we developed a - • The paper is structured as follows. its results & performance in Sec. 5. As a conclusion, we de- ## 2. Previous work and State of the Art Identifying the structure of a table has been an outstand- Before the rising Image-to-Text networks tag-decoder which is constrained to the table-tags. In Graph Hybrid Deep Learning-Rule-Based approach ## 3. Datasets We rely on large-scale datasets such as PubTabNet [37], Figure 2: balance in the previous datasets. The PubTabNet dataset contains 509k tables delivered as Due to the heterogeneity across the dataset formats, it amount of such tables, and kept only those ones ranging The availability of the bounding boxes for all table cells As it is illustrated in Fig. 2, the table distributions from Motivated by those observations we aimed at generating In this regard, we have prepared four synthetic datasets, Table one adopts a colorful appearance with high contrast and the Tab. 1 summarizes the various attributes of the datasets. ## 4. The TableFormer model Given the image of a table, TableFormer is able to pre- ## 4.1. Model architecture. We now describe in detail the proposed method, which CNN Backbone Network. Figure 3: Figure 4: forming classification, and adding an adaptive pooling layer Structure Decoder. The Cell BBox Decoder. The encoding generated by the tention encoding is then multiplied to the encoded image to The output Loss Functions. The loss used to train the TableFormer can be defined as where ## 5. Experimental Results ## 5.1. Implementation Details TableFormer uses ResNet-18 as the Although input constraints are used also by other methods, runtime performance and lower memory footprint of Table- The Transformer Encoder consists of two 'Transformer For training, TableFormer is trained with 3 Adam opti- TableFormer is implemented with PyTorch and Torchvi- ## 5.2. Generalization TableFormer is evaluated on three major publicly avail- We also share our baseline results on the challenging ## 5.3. Datasets and Metrics The Tree-Edit-Distance-Based Similarity (TEDS) met- where ## 5.4. Quantitative Analysis Structure. Table 2: FT: Model was trained on PubTabNet then finetuned. Cell Detection. our Table 3: Cell Content. Table 4: - a. Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells ## Japanese language (previously unseen by TableFormer): ## Example table from FinTabNet: b. Structure predicted by TableFormer, with superimposed matched PDF cell text: Text is aligned to match original for ease of viewing Figure 5: Figure 6: ## 5.5. Qualitative Analysis We showcase ## 6. Future Work & Conclusion In this paper, we presented TableFormer an end-to-end ## References - [1] - [2] - [3] - [4] - [5] - [6] - [7] - [8] - [9] - [10] - [11] - [12] - [13] - [14] - [15] - end object detection with transformers. - [16] - [17] - [18] - [19] - [20] - [21] - [22] - [23] - [24] - [25] - [26] - [27] - [28] - [29] - [30] - [31] - [32] - [33] - [34] - [35] - [36] - [37] Computer Vision and Pattern Recognition - [38] - and evaluation. ## TableFormer: Table Structure Understanding with Transformers ## 1. Details on the datasets ances in regard to their size, ## 1.1. Data preparation As a first step of our data preparation process, we have We have developed a technique that tries Figure 7 illustrates the distribution of the tables across ## 1.2. Synthetic datasets Aiming to train and evaluate our models in a broader ## 2. The process of generating a synthetic dataset can be de- - 1. - 2. - 3. Generate content: Based on the dataset - 4. - 5. Although TableFormer can predict the table structure and Figure 7: - • However, it is possible to mitigate those limitations by Here is a step-by-step description of the prediction post- - 1. Get the minimal grid dimensions - number of rows and - 2. - 3. - 3.a. - 4. where - 5. dian cell size for all table cells. - 6. - 7. - 8. - 9. 9a. - 9b. - 9c. - 9d. Intersect the orphan's bounding box with the column - 9e. If the table cell under the identified row and column phan cell. 9f. Aditional images with examples of TableFormer predic- Figure 8: Figure 9: Figure 10: Figure 11: Figure 12: Figure 13: Figure 14: Figure 15: Figure 16: Figure 17: