feat: Use new TableFormer model weights and default to accurate model version (#1100)

* feat: New tableformer model weights [WIP]

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Updated TF version

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated tests, after merging with Main, Switched to Accurate TF model by default

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
Christoph Auer
2025-03-11 10:53:49 +01:00
committed by GitHub
parent 5e30381c0d
commit eb97357b05
43 changed files with 213 additions and 229 deletions

View File

@@ -126,14 +126,13 @@ We have chosen the PubTabNet data set to perform HPO, since it includes a highly
Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart.
| # | # | Language | TEDs | TEDs | TEDs | mAP | Inference |
|------------|------------|------------|-------------|-------------|-------------|-------------|-------------|
| enc-layers | dec-layers | Language | simple | complex | all | (0.75) | time (secs) |
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 | 0.927 | 0.853 | 1.97 |
| 2 | 4 | OTSL | 0.923 0.945 | 0.909 0.897 | 0.938 | 0.843 | 3.77 |
| | | HTML | | 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
| # enc-layers | # dec-layers | Language | TEDs | TEDs | TEDs | mAP | Inference |
|----------------|----------------|------------|-------------|-------------|-------------|-------------|-------------|
| # enc-layers | # dec-layers | Language | simple | complex | all | (0.75) | time (secs) |
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 0.909 | 0.927 0.938 | 0.853 0.843 | 1.97 3.77 |
| 2 | 4 | OTSL HTML | 0.923 0.945 | 0.897 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
## 5.2 Quantitative Results
@@ -143,15 +142,12 @@ Additionally, the results show that OTSL has an advantage over HTML when applied
Table 2. TSR and cell detection results compared between OTSL and HTML on the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using TableFormer [9] (with enc=6, dec=6, heads=8).
| | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inference time (secs) |
|--------------|------------|--------|---------|--------|-------------|-------------------------|
| | Language | simple | complex | all | mAP(0.75) | Inference time (secs) |
| PubTabNet | OTSL | 0.965 | 0.934 | 0.955 | 0.88 | 2.73 |
| PubTabNet | HTML | 0.969 | 0.927 | 0.955 | 0.857 | 5.39 |
| FinTabNet | OTSL | 0.955 | 0.961 | 0.959 | 0.862 | 1.85 |
| FinTabNet | HTML | 0.917 | 0.922 | 0.92 | 0.722 | 3.26 |
| PubTables-1M | OTSL | 0.987 | 0.964 | 0.977 | 0.896 | 1.79 |
| PubTables-1M | HTML | 0.983 | 0.944 | 0.966 | 0.889 | 3.26 |
| Data set | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inference time (secs) |
|--------------|------------|-------------|-------------|-------------|-------------|-------------------------|
| Data set | Language | simple | complex | all | mAP(0.75) | Inference time (secs) |
| PubTabNet | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
| FinTabNet | OTSL HTML | 0.955 0.917 | 0.961 0.922 | 0.959 0.92 | 0.862 0.722 | 1.85 3.26 |
| PubTables-1M | OTSL HTML | 0.987 0.983 | 0.964 0.944 | 0.977 0.966 | 0.896 0.889 | 1.79 3.26 |
## 5.3 Qualitative Results