mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-09 05:08:14 +00:00
feat: Use new TableFormer model weights and default to accurate model version (#1100)
* feat: New tableformer model weights [WIP] Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Updated TF version Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Updated tests, after merging with Main, Switched to Accurate TF model by default Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
@@ -130,14 +130,13 @@ We have chosen the PubTabNet data set to perform HPO, since it includes a highly
|
||||
|
||||
Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart.
|
||||
|
||||
| # | # | Language | TEDs | TEDs | TEDs | mAP | Inference |
|
||||
|------------|------------|------------|-------------|-------------|-------------|-------------|-------------|
|
||||
| enc-layers | dec-layers | Language | simple | complex | all | (0.75) | time (secs) |
|
||||
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 | 0.927 | 0.853 | 1.97 |
|
||||
| 2 | 4 | OTSL | 0.923 0.945 | 0.909 0.897 | 0.938 | 0.843 | 3.77 |
|
||||
| | | HTML | | 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
|
||||
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
|
||||
| # enc-layers | # dec-layers | Language | TEDs | TEDs | TEDs | mAP | Inference |
|
||||
|----------------|----------------|------------|-------------|-------------|-------------|-------------|-------------|
|
||||
| # enc-layers | # dec-layers | Language | simple | complex | all | (0.75) | time (secs) |
|
||||
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 0.909 | 0.927 0.938 | 0.853 0.843 | 1.97 3.77 |
|
||||
| 2 | 4 | OTSL HTML | 0.923 0.945 | 0.897 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
|
||||
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
|
||||
|
||||
## 5.2 Quantitative Results
|
||||
|
||||
@@ -147,15 +146,12 @@ Additionally, the results show that OTSL has an advantage over HTML when applied
|
||||
|
||||
Table 2. TSR and cell detection results compared between OTSL and HTML on the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using TableFormer [9] (with enc=6, dec=6, heads=8).
|
||||
|
||||
| | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inference time (secs) |
|
||||
|--------------|------------|--------|---------|--------|-------------|-------------------------|
|
||||
| | Language | simple | complex | all | mAP(0.75) | Inference time (secs) |
|
||||
| PubTabNet | OTSL | 0.965 | 0.934 | 0.955 | 0.88 | 2.73 |
|
||||
| PubTabNet | HTML | 0.969 | 0.927 | 0.955 | 0.857 | 5.39 |
|
||||
| FinTabNet | OTSL | 0.955 | 0.961 | 0.959 | 0.862 | 1.85 |
|
||||
| FinTabNet | HTML | 0.917 | 0.922 | 0.92 | 0.722 | 3.26 |
|
||||
| PubTables-1M | OTSL | 0.987 | 0.964 | 0.977 | 0.896 | 1.79 |
|
||||
| PubTables-1M | HTML | 0.983 | 0.944 | 0.966 | 0.889 | 3.26 |
|
||||
| Data set | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inference time (secs) |
|
||||
|--------------|------------|-------------|-------------|-------------|-------------|-------------------------|
|
||||
| Data set | Language | simple | complex | all | mAP(0.75) | Inference time (secs) |
|
||||
| PubTabNet | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| FinTabNet | OTSL HTML | 0.955 0.917 | 0.961 0.922 | 0.959 0.92 | 0.862 0.722 | 1.85 3.26 |
|
||||
| PubTables-1M | OTSL HTML | 0.987 0.983 | 0.964 0.944 | 0.977 0.966 | 0.896 0.889 | 1.79 3.26 |
|
||||
|
||||
## 5.3 Qualitative Results
|
||||
|
||||
|
||||
Reference in New Issue
Block a user