mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-10 05:38:17 +00:00
add modified test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
107
tests/data/groundtruth/docling_v2/2206.01062.md
vendored
107
tests/data/groundtruth/docling_v2/2206.01062.md
vendored
@@ -105,21 +105,20 @@ The annotation campaign was carried out in four phases. In phase one, we identif
|
||||
|
||||
Table 1: DocLayNet dataset overview. Along with the frequency of each class label, we present the relative occurrence (as % of row 'Total') in the train, test and validation sets. The inter-annotator agreement is computed as the mAP@0.5-0.95 metric between pairwise annotations from the triple-annotated pages, from which we obtain accuracy ranges.
|
||||
|
||||
| | | % of Total | % of Total | % of Total | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) |
|
||||
|----------------|---------|--------------|--------------|--------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|
|
||||
| class label | Count | Train | Test | Val | All | Fin | Man | Sci | Law | Pat | Ten |
|
||||
| Caption | 22524 | 2.04 | 1.77 | 2.32 | 84-89 | 40-61 | 86-92 | 94-99 | 95-99 | 69-78 | n/a |
|
||||
| Footnote | 6318 | 0.60 | 0.31 | 0.58 | 83-91 | n/a | 100 | 62-88 | 85-94 | n/a | 82-97 |
|
||||
| Formula | 25027 | 2.25 | 1.90 | 2.96 | 83-85 | n/a | n/a | 84-87 | 86-96 | n/a | n/a |
|
||||
| List-item | 185660 | 17.19 | 13.34 | 15.82 | 87-88 | 74-83 | 90-92 | 97-97 | 81-85 | 75-88 | 93-95 |
|
||||
| Page-footer | 70878 | 6.51 | 5.58 | 6.00 | 93-94 | 88-90 | 95-96 | 100 | 92-97 | 100 | 96-98 |
|
||||
| Page-header | 58022 | 5.10 | 6.70 | 5.06 | 85-89 | 66-76 | 90-94 | 98-100 | 91-92 | 97-99 | 81-86 |
|
||||
| Picture | 45976 | 4.21 | 2.78 | 5.31 | 69-71 | 56-59 | 82-86 | 69-82 | 80-95 | 66-71 | 59-76 |
|
||||
| Section-header | 142884 | 12.60 | 15.77 | 12.85 | 83-84 | 76-81 | 90-92 | 94-95 | 87-94 | 69-73 | 78-86 |
|
||||
| Table | 34733 | 3.20 | 2.27 | 3.60 | 77-81 | 75-80 | 83-86 | 98-99 | 58-80 | 79-84 | 70-85 |
|
||||
| Text | 510377 | 45.82 | 49.28 | 45.00 | 84-86 | 81-86 | 88-93 | 89-93 | 87-92 | 71-79 | 87-95 |
|
||||
| Title | 5071 | 0.47 | 0.30 | 0.50 | 60-72 | 24-63 | 50-63 | 94-100 | 82-96 | 68-79 | 24-56 |
|
||||
| Total | 1107470 | 941123 | 99816 | 66531 | 82-83 | 71-74 | 79-81 | 89-94 | 86-91 | 71-76 | 68-85 |
|
||||
| class label | Count | % of Total | % of Total | % of Total | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) |
|
||||
|----------------|---------|--------------|--------------|--------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|
|
||||
| class label | Caption | 22524 | 2.04 | 1.77 | 2.32 | 84-89 | 40-61 | 86-92 | 94-99 | 95-99 | 69-78 | n/a |
|
||||
| Footnote | 6318 | 0.6 | 0.31 | 0.58 | 83-91 | n/a | 100 | 62-88 | 85-94 | n/a | 82-97 | |
|
||||
| Formula | 25027 | 2.25 | 1.9 | 2.96 | 83-85 | n/a | n/a | 84-87 | 86-96 | n/a | n/a | |
|
||||
| List-item | 185660 | 17.19 | 13.34 | 15.82 | 87-88 | 74-83 | 90-92 | 97-97 | 81-85 | 75-88 | 93-95 | |
|
||||
| Page-footer | 70878 | 6.51 | 5.58 | 6 | 93-94 | 88-90 | 95-96 | 100 | 92-97 | 100 | 96-98 | |
|
||||
| Page-header | 58022 | 5.1 | 6.7 | 5.06 | 85-89 | 66-76 | 90-94 | 98-100 | 91-92 | 97-99 | 81-86 | |
|
||||
| Picture | 45976 | 4.21 | 2.78 | 5.31 | 69-71 | 56-59 | 82-86 | 69-82 | 80-95 | 66-71 | 59-76 | |
|
||||
| Section-header | 142884 | 12.6 | 15.77 | 12.85 | 83-84 | 76-81 | 90-92 | 94-95 | 87-94 | 69-73 | 78-86 | |
|
||||
| Table | 34733 | 3.2 | 2.27 | 3.6 | 77-81 | 75-80 | 83-86 | 98-99 | 58-80 | 79-84 | 70-85 | |
|
||||
| Text | 510377 | 45.82 | 49.28 | 45 | 84-86 | 81-86 | 88-93 | 89-93 | 87-92 | 71-79 | 87-95 | |
|
||||
| Title | 5071 | 0.47 | 0.3 | 0.5 | 60-72 | 24-63 | 50-63 | 94-100 | 82-96 | 68-79 | 24-56 | |
|
||||
| Total | 1107470 | 941123 | 99816 | 66531 | 82-83 | 71-74 | 79-81 | 89-94 | 86-91 | 71-76 | 68-85 | |
|
||||
|
||||
Figure 3: Corpus Conversion Service annotation user interface. The PDF page is shown in the background, with overlaid text-cells (in darker shades). The annotation boxes can be drawn by dragging a rectangle over each segment with the respective label from the palette on the right.
|
||||
|
||||
@@ -164,21 +163,20 @@ Phase 4: Production annotation. The previously selected 80K pages were annotated
|
||||
|
||||
Table 2: Prediction performance (mAP@0.5-0.95) of object detection networks on DocLayNet test set. The MRCNN (Mask R-CNN) and FRCNN (Faster R-CNN) models with ResNet-50 or ResNet-101 backbone were trained based on the network architectures from the detectron2 model zoo (Mask R-CNN R50, R101-FPN 3x, Faster R-CNN R101-FPN 3x), with default configurations. The YOLO implementation utilized was YOLOv5x6 [13]. All models were initialised using pre-trained weights from the COCO 2017 dataset.
|
||||
|
||||
| | human | MRCNN | MRCNN | FRCNN | YOLO |
|
||||
| | human | MRCNN | MRCNN | FRCNN | YOLO |
|
||||
|----------------|---------|---------|---------|---------|--------|
|
||||
| | | R50 | R101 | R101 | v5x6 |
|
||||
| Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 |
|
||||
| Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 |
|
||||
| Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 |
|
||||
| List-item | 87-88 | 81.2 | 80.8 | 81.0 | 86.2 |
|
||||
| Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 |
|
||||
| Page-header | 85-89 | 71.9 | 70.0 | 72.0 | 67.9 |
|
||||
| Picture | 69-71 | 71.7 | 72.7 | 72.0 | 77.1 |
|
||||
| Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 |
|
||||
| Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 |
|
||||
| Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 |
|
||||
| Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 |
|
||||
| All | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 |
|
||||
| Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 |
|
||||
| Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 |
|
||||
| Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 |
|
||||
| List-item | 87-88 | 81.2 | 80.8 | 81 | 86.2 |
|
||||
| Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 |
|
||||
| Page-header | 85-89 | 71.9 | 70 | 72 | 67.9 |
|
||||
| Picture | 69-71 | 71.7 | 72.7 | 72 | 77.1 |
|
||||
| Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 |
|
||||
| Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 |
|
||||
| Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 |
|
||||
| Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 |
|
||||
| All | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 |
|
||||
|
||||
to avoid this at any cost in order to have clear, unbiased baseline numbers for human document-layout annotation. Third, we introduced the feature of snapping boxes around text segments to obtain a pixel-accurate annotation and again reduce time and effort. The CCS annotation tool automatically shrinks every user-drawn box to the minimum bounding-box around the enclosed text-cells for all purely text-based segments, which excludes only Table and Picture . For the latter, we instructed annotation staff to minimise inclusion of surrounding whitespace while including all graphical lines. A downside of snapping boxes to enclosed text cells is that some wrongly parsed PDF pages cannot be annotated correctly and need to be skipped. Fourth, we established a way to flag pages as rejected for cases where no valid annotation according to the label guidelines could be achieved. Example cases for this would be PDF pages that render incorrectly or contain layouts that are impossible to capture with non-overlapping rectangles. Such rejected pages are not contained in the final dataset. With all these measures in place, experienced annotation staff managed to annotate a single page in a typical timeframe of 20s to 60s, depending on its complexity.
|
||||
|
||||
@@ -225,21 +223,20 @@ The choice and number of labels can have a significant effect on the overall mod
|
||||
|
||||
Table 4: Performance of a Mask R-CNN R50 network with document-wise and page-wise split for different label sets. Naive page-wise split will result in /tildelow 10% point improvement.
|
||||
|
||||
| Class-count Split | 11 | 11 | 5 | 5 |
|
||||
|---------------------|------|------|-----|------|
|
||||
| | Doc | Page | Doc | Page |
|
||||
| Caption | 68 | 83 | | |
|
||||
| Footnote | 71 | 84 | | |
|
||||
| Formula | 60 | 66 | | |
|
||||
| List-item | 81 | 88 | 82 | 88 |
|
||||
| Page-footer | 62 | 89 | | |
|
||||
| Page-header | 72 | 90 | | |
|
||||
| Picture | 72 | 82 | 72 | 82 |
|
||||
| Section-header | 68 | 83 | 69 | 83 |
|
||||
| Table | 82 | 89 | 82 | 90 |
|
||||
| Text | 85 | 91 | 84 | 90 |
|
||||
| Title | 77 | 81 | | |
|
||||
| All | 72 | 84 | 78 | 87 |
|
||||
| Class-count Split | 11Doc Page | 5Doc Page | Page |
|
||||
|---------------------|--------------|-------------|--------|
|
||||
| Caption | 68 | 83 | |
|
||||
| Footnote | 71 | 84 | |
|
||||
| Formula | 60 | 66 | |
|
||||
| List-item | 81 | 88 | 82 |
|
||||
| Page-footer | 62 | 89 | |
|
||||
| Page-header | 72 | 90 | |
|
||||
| Picture | 72 | 82 | 82 |
|
||||
| Section-header | 68 | 83 | 83 |
|
||||
| Table | 82 | 89 | 82 |
|
||||
| Text | 85 | 91 | 84 |
|
||||
| Title | 77 | 81 | |
|
||||
| All | 72 | 84 | 78 |
|
||||
|
||||
lists in PubLayNet (grouped list-items) versus DocLayNet (separate list-items), the label set of size 4 is the closest to PubLayNet, in the assumption that the List is down-mapped to Text in PubLayNet. The results in Table 3 show that the prediction accuracy on the remaining class labels does not change significantly when other classes are merged into them. The overall macro-average improves by around 5%, in particular when Page-footer and Page-header are excluded.
|
||||
|
||||
@@ -255,22 +252,12 @@ KDD '22, August 14-18, 2022, Washington, DC, USA Birgit Pfitzmann, Christoph Aue
|
||||
|
||||
Table 5: Prediction Performance (mAP@0.5-0.95) of a Mask R-CNN R50 network across the PubLayNet, DocBank & DocLayNet data-sets. By evaluating on common label classes of each dataset, we observe that the DocLayNet-trained model has much less pronounced variations in performance across all datasets.
|
||||
|
||||
| | | Testing on | Testing on | Testing on |
|
||||
|-----------------|------------|--------------|--------------|--------------|
|
||||
| Training on | labels | PLN | DB | DLN |
|
||||
| PubLayNet (PLN) | Figure | 96 | 43 | 23 |
|
||||
| PubLayNet (PLN) | Sec-header | 87 | - | 32 |
|
||||
| | Table | 95 | 24 | 49 |
|
||||
| | Text | 96 | - | 42 |
|
||||
| | total | 93 | 34 | 30 |
|
||||
| DocBank (DB) | Figure | 77 | 71 | 31 |
|
||||
| DocBank (DB) | Table | 19 | 65 | 22 |
|
||||
| DocBank (DB) | total | 48 | 68 | 27 |
|
||||
| DocLayNet (DLN) | Figure | 67 | 51 | 72 |
|
||||
| DocLayNet (DLN) | Sec-header | 53 | - | 68 |
|
||||
| | Table | 87 | 43 | 82 |
|
||||
| | Text | 77 | - | 84 |
|
||||
| | total | 59 | 47 | 78 |
|
||||
| Training on | labels | Testing on PLN | DB | DLN |
|
||||
|-----------------|------------------------------------|------------------|-------------------|----------------|
|
||||
| PubLayNet (PLN) | Figure Sec-header Table Text total | 96 87 95 96 | 43 - 32 | 23 49 24 42 |
|
||||
| DocBank (DB) | Figure Table total | 77 19 48 | 71 65 68 27 | 31 22 27 72 |
|
||||
| DocLayNet (DLN) | Figure Sec-header Table Text total | 67 53 87 77 | 51 - 68 43 84 | 72 78 82 84 |
|
||||
| DocLayNet (DLN) | total | 59 | 47 | 78 |
|
||||
|
||||
Section-header , Table and Text . Before training, we either mapped or excluded DocLayNet's other labels as specified in table 3, and also PubLayNet's List to Text . Note that the different clustering of lists (by list-element vs. whole list objects) naturally decreases the mAP score for Text .
|
||||
|
||||
|
||||
Reference in New Issue
Block a user