mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-10 05:38:17 +00:00
add modified test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
149
tests/data/groundtruth/docling_v2/2203.01017v2.md
vendored
149
tests/data/groundtruth/docling_v2/2203.01017v2.md
vendored
@@ -14,14 +14,128 @@ The occurrence of tables in documents is ubiquitous. They often summarise quanti
|
||||
|
||||
## a. Picture of a table:
|
||||
|
||||
| | | 1 | Observer 1 | Observer 1 | Total observer 2 |
|
||||
|-------|--------|-----------|--------------|--------------|---------------------|
|
||||
| 3 | benign | malignant | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | | | | | |
|
||||
|
||||
7
|
||||
|
||||
| 0 | 1 2 1 | 1 2 1 | 1 2 1 |
|
||||
|------|---------|---------|---------|
|
||||
| 3 4 | 5 | 6 | 7 |
|
||||
| 9 13 | 10 | 11 | 12 |
|
||||
| 8 2 | 14 | 15 | 16 |
|
||||
| 17 | 18 | 19 | 20 |
|
||||
| 0 | 1 | 2 | 1 | |
|
||||
|-----|-----|-----|-----|----|
|
||||
| 3 | 4 | 5 | 6 | 7 |
|
||||
| 8 | 9 | 10 | 11 | 12 |
|
||||
| 2 | 13 | 14 | 15 | 16 |
|
||||
| 17 | 18 | 19 | 20 | |
|
||||
|
||||
Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.
|
||||
|
||||
@@ -92,14 +206,14 @@ In this regard, we have prepared four synthetic datasets, each one containing 15
|
||||
|
||||
Table 1: Both 'Combined-Tabnet' and 'CombinedTabnet' are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank.
|
||||
|
||||
| | Tags | Bbox | Size | Format |
|
||||
| | Tags | Bbox | Size | Format |
|
||||
|--------------------|--------|--------|--------|----------|
|
||||
| PubTabNet | 3 | 3 | 509k | PNG |
|
||||
| FinTabNet | 3 | 3 | 112k | PDF |
|
||||
| TableBank | 3 | 7 | 145k | JPEG |
|
||||
| Combined-Tabnet(*) | 3 | 3 | 400k | PNG |
|
||||
| Combined(**) | 3 | 3 | 500k | PNG |
|
||||
| SynthTabNet | 3 | 3 | 600k | PNG |
|
||||
| PubTabNet | ✓ | ✓ | 509k | PNG |
|
||||
| FinTabNet | ✓ | ✓ | 112k | PDF |
|
||||
| TableBank | ✓ | ✗ | 145k | JPEG |
|
||||
| Combined-Tabnet(*) | ✓ | ✓ | 400k | PNG |
|
||||
| Combined(**) | ✓ | ✓ | 500k | PNG |
|
||||
| SynthTabNet | ✓ | ✓ | 600k | PNG |
|
||||
|
||||
one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples.
|
||||
|
||||
@@ -364,6 +478,15 @@ Figure 9: Example of a table with big empty distance between cells.
|
||||
|
||||
<!-- image -->
|
||||
|
||||
| | | | | | |
|
||||
|-----------|---------|-------|-------------|-------------|-------------|
|
||||
| | | ANOVA | ANOVA | ANOVA | ANOVA |
|
||||
| 1 | Sum Sq | Df | F Value | Pr (>F) | |
|
||||
| P | 5745.2 | 1 | 266.75 | 4.64 × 10−9 | 2.76 × 10−6 |
|
||||
| 2 | 2191.39 | 2 | 30.87 | 1.07 × 10−6 | 3.07 × 10−6 |
|
||||
| P | 2 | 61.48 | 1.07 × 10−6 | 2 | |
|
||||
| Residuals | 236.91 | 3 | 3 | 2 | 2 |
|
||||
|
||||
Figure 10: Example of a complex table with empty cells.
|
||||
|
||||
<!-- image -->
|
||||
|
||||
Reference in New Issue
Block a user