5.3 KiB
TableFormer: Table Structure Understanding with Transformers.
Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar
{
Abstract
Tables organize valuable content in a concise and com-
1. Introduction
The occurrence of tables
a. Picture of a table:
-
b. Red-annotation of bounding boxes,
-
c. Structure predicted by TableFormer:
Figure 1:
Recently,
The first problem is called table-location and has been
considered as a solved problem, given enough ground-truth
The second problem is called table-structure decompo-
In this paper, we want to address these weaknesses and
To meet the design criteria listed above, we developed a
- •
The paper is structured as follows.
its results & performance in Sec. 5. As a conclusion, we de-
2. Previous work and State of the Art
Identifying the structure of a table has been an outstand-
Before the rising
Image-to-Text networks
tag-decoder which is constrained to the table-tags.
In
Graph
Hybrid Deep Learning-Rule-Based approach
3. Datasets
We rely on large-scale datasets such as PubTabNet [37],
Figure 2:
balance in the previous datasets.
The PubTabNet dataset contains 509k tables delivered as
Due to the heterogeneity across the dataset formats, it
amount of such tables, and kept only those ones ranging
The availability of the bounding boxes for all table cells
As it is illustrated in Fig. 2, the table distributions from
Motivated by those observations we aimed at generating
In this regard, we have prepared four synthetic datasets,
Table
one adopts a colorful appearance with high contrast and the
Tab. 1 summarizes the various attributes of the datasets.
4. The TableFormer model
Given the image of a table, TableFormer is able to pre-
4.1. Model architecture.
We now describe in detail the proposed method, which
CNN Backbone Network.
Figure 3:
Figure 4:
forming classification, and adding an adaptive pooling layer
Structure Decoder.
The
Cell BBox Decoder.
The encoding generated by the
tention encoding is then multiplied to the encoded image to
The output
Loss Functions.
The loss used to train the TableFormer can be defined as
where
5. Experimental Results
5.1. Implementation Details
TableFormer uses ResNet-18 as the
Although input constraints are used also by other methods, runtime performance and lower memory footprint of Table-
The Transformer Encoder consists of two 'Transformer
For training, TableFormer is trained with 3 Adam opti-
TableFormer is implemented with PyTorch and Torchvi-
5.2. Generalization
TableFormer is evaluated on three major publicly avail-
We also share our baseline results on the challenging
5.3. Datasets and Metrics
The Tree-Edit-Distance-Based Similarity (TEDS) met-
where
5.4. Quantitative Analysis
Structure.
Table 2:
FT: Model was trained on PubTabNet then finetuned.
Cell Detection.
our
Table 3:
Cell Content.
Table 4:
- a. Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells
Japanese language (previously unseen by TableFormer):
Example table from FinTabNet:
b. Structure predicted by TableFormer, with superimposed matched PDF cell text:
Text is aligned to match original for ease of viewing
Figure 5:
Figure 6:
5.5. Qualitative Analysis
We showcase
6. Future Work & Conclusion
In this paper, we presented TableFormer an end-to-end
References
-
[1]
-
[2]
-
[3]
-
[4]
-
[5]
-
[6]
-
[7]
-
[8]
-
[9]
-
[10]
-
[11]
-
[12]
-
[13]
-
[14]
-
[15]
-
end object detection with transformers.
-
[16]
-
[17]
-
[18]
-
[19]
-
[20]
-
[21]
-
[22]
-
[23]
-
[24]
-
[25]
-
[26]
-
[27]
-
[28]
-
[29]
-
[30]
-
[31]
-
[32]
-
[33]
-
[34]
-
[35]
-
[36]
-
[37]
Computer Vision and Pattern Recognition
-
[38]
-
and evaluation.
TableFormer: Table Structure Understanding with Transformers
1. Details on the datasets
ances in regard to their size,
1.1. Data preparation
As a first step of our data preparation process, we have
We have developed a technique that tries
Figure 7 illustrates the distribution of the tables across
1.2. Synthetic datasets
Aiming to train and evaluate our models in a broader
2.
The process of generating a synthetic dataset can be de-
-
-
-
- Generate content: Based on the dataset
-
-
Although TableFormer can predict the table structure and
Figure 7:
- •
However, it is possible to mitigate those limitations by
Here is a step-by-step description of the prediction post-
-
- Get the minimal grid dimensions - number of rows and
-
-
-
3.a.
-
where
dian cell size for all table cells.
9a.
-
9b.
-
9c.
-
9d. Intersect the orphan's bounding box with the column
-
9e. If the table cell under the identified row and column
phan cell.
9f.
Aditional images with examples of TableFormer predic-
Figure 8:
Figure 9:
Figure 10:
Figure 11:
Figure 12:
Figure 13:
Figure 14:
Figure 15:
Figure 16:
Figure 17: