docling/tests/data/groundtruth/docling_v2/2203.01017v2.md

## TableFormer: Table Structure Understanding with Transformers.

## Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar

{

## Abstract

Tables organize valuable content in a concise and com-

## 1. Introduction

The occurrence of tables

## a. Picture of a table:

<!-- image -->

- b. Red-annotation of bounding boxes,
- c. Structure predicted by TableFormer:

<!-- image -->

Figure 1:

<!-- image -->

Recently,

The first problem is called table-location and has been

considered as a solved problem, given enough ground-truth

The second problem is called table-structure decompo-

In this paper, we want to address these weaknesses and

To meet the design criteria listed above, we developed a

- •
- •
- •
- •

The paper is structured as follows.

its results &amp; performance in Sec. 5. As a conclusion, we de-

## 2. Previous work and State of the Art

Identifying the structure of a table has been an outstand-

Before the rising

Image-to-Text networks

tag-decoder which is constrained to the table-tags.

In

Graph

Hybrid Deep Learning-Rule-Based approach

## 3. Datasets

We rely on large-scale datasets such as PubTabNet [37],

Figure 2:

<!-- image -->

balance in the previous datasets.

The PubTabNet dataset contains 509k tables delivered as

Due to the heterogeneity across the dataset formats, it

amount of such tables, and kept only those ones ranging

The availability of the bounding boxes for all table cells

As it is illustrated in Fig. 2, the table distributions from

Motivated by those observations we aimed at generating

In this regard, we have prepared four synthetic datasets,

Table

one adopts a colorful appearance with high contrast and the

Tab. 1 summarizes the various attributes of the datasets.

## 4. The TableFormer model

Given the image of a table, TableFormer is able to pre-

## 4.1. Model architecture.

We now describe in detail the proposed method, which

CNN Backbone Network.

Figure 3:

<!-- image -->

Figure 4:

<!-- image -->

forming classification, and adding an adaptive pooling layer

Structure Decoder.

The

Cell BBox Decoder.

The encoding generated by the

tention encoding is then multiplied to the encoded image to

The output

Loss Functions.

The loss used to train the TableFormer can be defined as

<!-- formula-not-decoded -->

where

## 5. Experimental Results

## 5.1. Implementation Details

TableFormer uses ResNet-18 as the

<!-- formula-not-decoded -->

Although input constraints are used also by other methods, runtime performance and lower memory footprint of Table-

The Transformer Encoder consists of two 'Transformer

For training, TableFormer is trained with 3 Adam opti-

TableFormer is implemented with PyTorch and Torchvi-

## 5.2. Generalization

TableFormer is evaluated on three major publicly avail-

We also share our baseline results on the challenging

## 5.3. Datasets and Metrics

The Tree-Edit-Distance-Based Similarity (TEDS) met-

<!-- formula-not-decoded -->

where

## 5.4. Quantitative Analysis

Structure.

Table 2:

FT: Model was trained on PubTabNet then finetuned.

Cell Detection.

our

Table 3:

Cell Content.

Table 4:

- a. Red - PDF cells, Green - predicted bounding boxes, Blue - post-processed predictions matched to PDF cells

## Japanese language (previously unseen by TableFormer):

## Example table from FinTabNet:

<!-- image -->

<!-- image -->

Figure 5:

<!-- image -->

Figure 6:

<!-- image -->

## 5.5. Qualitative Analysis

We showcase

<!-- image -->

## 6. Future Work &amp; Conclusion

In this paper, we presented TableFormer an end-to-end

## References

- [1]

- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- end object detection with transformers.
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]

- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]

Computer Vision and Pattern Recognition

- [38]
- and evaluation.

## TableFormer: Table Structure Understanding with Transformers

## 1. Details on the datasets

ances in regard to their size,

## 1.1. Data preparation

As a first step of our data preparation process, we have

We have developed a technique that tries

Figure 7 illustrates the distribution of the tables across

## 1.2. Synthetic datasets

Aiming to train and evaluate our models in a broader

## 2.

The process of generating a synthetic dataset can be de-

- 1.
- 2.
- 3. Generate content: Based on the dataset
- 4.
- 5.

Although TableFormer can predict the table structure and

Figure 7:

<!-- image -->

- •
- •

However, it is possible to mitigate those limitations by

Here is a step-by-step description of the prediction post-

- 1. Get the minimal grid dimensions - number of rows and
- 2.
- 3.
- 3.a.
- 4.

where

- 5.

dian cell size for all table cells.

- 6.
- 7.
- 8.
- 9.

9a.

- 9b.
- 9c.
- 9d. Intersect the orphan's bounding box with the column
- 9e. If the table cell under the identified row and column

<!-- formula-not-decoded -->

phan cell.

9f.

Aditional images with examples of TableFormer predic-

Figure 9:

<!-- image -->

Figure 10:

<!-- image -->

<!-- image -->

<!-- image -->

Figure 13:

<!-- image -->

Figure 14:

Figure 15:

<!-- image -->

<!-- image -->

<!-- image -->

<!-- image -->

Figure 16:

<!-- image -->

<!-- image -->

Figure 17:

<!-- image -->