mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-28 21:14:23 +00:00
368 lines
5.3 KiB
Markdown
368 lines
5.3 KiB
Markdown
## TableFormer: Table Structure Understanding with Transformers.
|
|
|
|
## Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar
|
|
|
|
{
|
|
|
|
## Abstract
|
|
|
|
Tables organize valuable content in a concise and com-
|
|
|
|
## 1. Introduction
|
|
|
|
The occurrence of tables
|
|
|
|
## a. Picture of a table:
|
|
|
|
<!-- image -->
|
|
|
|
- b. Red-annotation of bounding boxes,
|
|
- c. Structure predicted by TableFormer:
|
|
|
|
<!-- image -->
|
|
|
|
Figure 1:
|
|
|
|
<!-- image -->
|
|
|
|
Recently,
|
|
|
|
The first problem is called table-location and has been
|
|
|
|
considered as a solved problem, given enough ground-truth
|
|
|
|
The second problem is called table-structure decompo-
|
|
|
|
In this paper, we want to address these weaknesses and
|
|
|
|
To meet the design criteria listed above, we developed a
|
|
|
|
- •
|
|
- •
|
|
- •
|
|
- •
|
|
|
|
The paper is structured as follows.
|
|
|
|
its results & performance in Sec. 5. As a conclusion, we de-
|
|
|
|
## 2. Previous work and State of the Art
|
|
|
|
Identifying the structure of a table has been an outstand-
|
|
|
|
Before the rising
|
|
|
|
Image-to-Text networks
|
|
|
|
tag-decoder which is constrained to the table-tags.
|
|
|
|
In
|
|
|
|
Graph
|
|
|
|
Hybrid Deep Learning-Rule-Based approach
|
|
|
|
## 3. Datasets
|
|
|
|
We rely on large-scale datasets such as PubTabNet [37],
|
|
|
|
Figure 2:
|
|
|
|
<!-- image -->
|
|
|
|
balance in the previous datasets.
|
|
|
|
The PubTabNet dataset contains 509k tables delivered as
|
|
|
|
Due to the heterogeneity across the dataset formats, it
|
|
|
|
amount of such tables, and kept only those ones ranging
|
|
|
|
The availability of the bounding boxes for all table cells
|
|
|
|
As it is illustrated in Fig. 2, the table distributions from
|
|
|
|
Motivated by those observations we aimed at generating
|
|
|
|
In this regard, we have prepared four synthetic datasets,
|
|
|
|
Table
|
|
|
|
one adopts a colorful appearance with high contrast and the
|
|
|
|
Tab. 1 summarizes the various attributes of the datasets.
|
|
|
|
## 4. The TableFormer model
|
|
|
|
Given the image of a table, TableFormer is able to pre-
|
|
|
|
## 4.1. Model architecture.
|
|
|
|
We now describe in detail the proposed method, which
|
|
|
|
CNN Backbone Network.
|
|
|
|
Figure 3:
|
|
|
|
<!-- image -->
|
|
|
|
Figure 4:
|
|
|
|
<!-- image -->
|
|
|
|
forming classification, and adding an adaptive pooling layer
|
|
|
|
Structure Decoder.
|
|
|
|
The
|
|
|
|
Cell BBox Decoder.
|
|
|
|
The encoding generated by the
|
|
|
|
tention encoding is then multiplied to the encoded image to
|
|
|
|
The output
|
|
|
|
Loss Functions.
|
|
|
|
The loss used to train the TableFormer can be defined as
|
|
|
|
<!-- formula-not-decoded -->
|
|
|
|
where
|
|
|
|
## 5. Experimental Results
|
|
|
|
## 5.1. Implementation Details
|
|
|
|
TableFormer uses ResNet-18 as the
|
|
|
|
<!-- formula-not-decoded -->
|
|
|
|
Although input constraints are used also by other methods, runtime performance and lower memory footprint of Table-
|
|
|
|
The Transformer Encoder consists of two 'Transformer
|
|
|
|
For training, TableFormer is trained with 3 Adam opti-
|
|
|
|
TableFormer is implemented with PyTorch and Torchvi-
|
|
|
|
## 5.2. Generalization
|
|
|
|
TableFormer is evaluated on three major publicly avail-
|
|
|
|
We also share our baseline results on the challenging
|
|
|
|
## 5.3. Datasets and Metrics
|
|
|
|
The Tree-Edit-Distance-Based Similarity (TEDS) met-
|
|
|
|
<!-- formula-not-decoded -->
|
|
|
|
where
|
|
|
|
## 5.4. Quantitative Analysis
|
|
|
|
Structure.
|
|
|
|
Table 2:
|
|
|
|
FT: Model was trained on PubTabNet then finetuned.
|
|
|
|
Cell Detection.
|
|
|
|
our
|
|
|
|
Table 3:
|
|
|
|
Cell Content.
|
|
|
|
Table 4:
|
|
|
|
- a. Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells
|
|
|
|
## Japanese language (previously unseen by TableFormer):
|
|
|
|
## Example table from FinTabNet:
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
Figure 5:
|
|
|
|
<!-- image -->
|
|
|
|
Figure 6:
|
|
|
|
<!-- image -->
|
|
|
|
## 5.5. Qualitative Analysis
|
|
|
|
We showcase
|
|
|
|
<!-- image -->
|
|
|
|
## 6. Future Work & Conclusion
|
|
|
|
In this paper, we presented TableFormer an end-to-end
|
|
|
|
## References
|
|
|
|
- [1]
|
|
|
|
- [2]
|
|
- [3]
|
|
- [4]
|
|
- [5]
|
|
- [6]
|
|
- [7]
|
|
- [8]
|
|
- [9]
|
|
- [10]
|
|
- [11]
|
|
- [12]
|
|
- [13]
|
|
- [14]
|
|
- [15]
|
|
- end object detection with transformers.
|
|
- [16]
|
|
- [17]
|
|
- [18]
|
|
- [19]
|
|
- [20]
|
|
- [21]
|
|
- [22]
|
|
- [23]
|
|
- [24]
|
|
- [25]
|
|
|
|
- [26]
|
|
- [27]
|
|
- [28]
|
|
- [29]
|
|
- [30]
|
|
- [31]
|
|
- [32]
|
|
- [33]
|
|
- [34]
|
|
- [35]
|
|
- [36]
|
|
- [37]
|
|
|
|
Computer Vision and Pattern Recognition
|
|
|
|
- [38]
|
|
- and evaluation.
|
|
|
|
## TableFormer: Table Structure Understanding with Transformers
|
|
|
|
## 1. Details on the datasets
|
|
|
|
ances in regard to their size,
|
|
|
|
## 1.1. Data preparation
|
|
|
|
As a first step of our data preparation process, we have
|
|
|
|
We have developed a technique that tries
|
|
|
|
Figure 7 illustrates the distribution of the tables across
|
|
|
|
## 1.2. Synthetic datasets
|
|
|
|
Aiming to train and evaluate our models in a broader
|
|
|
|
## 2.
|
|
|
|
The process of generating a synthetic dataset can be de-
|
|
|
|
- 1.
|
|
- 2.
|
|
- 3. Generate content: Based on the dataset
|
|
- 4.
|
|
- 5.
|
|
|
|
Although TableFormer can predict the table structure and
|
|
|
|
Figure 7:
|
|
|
|
<!-- image -->
|
|
|
|
- •
|
|
- •
|
|
|
|
However, it is possible to mitigate those limitations by
|
|
|
|
Here is a step-by-step description of the prediction post-
|
|
|
|
- 1. Get the minimal grid dimensions - number of rows and
|
|
- 2.
|
|
- 3.
|
|
- 3.a.
|
|
- 4.
|
|
|
|
where
|
|
|
|
- 5.
|
|
|
|
dian cell size for all table cells.
|
|
|
|
- 6.
|
|
- 7.
|
|
- 8.
|
|
- 9.
|
|
|
|
9a.
|
|
|
|
- 9b.
|
|
- 9c.
|
|
- 9d. Intersect the orphan's bounding box with the column
|
|
- 9e. If the table cell under the identified row and column
|
|
|
|
<!-- formula-not-decoded -->
|
|
|
|
phan cell.
|
|
|
|
9f.
|
|
|
|
Aditional images with examples of TableFormer predic-
|
|
|
|
Figure 9:
|
|
|
|
<!-- image -->
|
|
|
|
Figure 10:
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
Figure 13:
|
|
|
|
<!-- image -->
|
|
|
|
Figure 14:
|
|
|
|
Figure 15:
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
Figure 16:
|
|
|
|
<!-- image -->
|
|
|
|
<!-- image -->
|
|
|
|
Figure 17:
|
|
|
|
<!-- image --> |