mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-10 05:38:17 +00:00
add modified test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
19904
tests/data/groundtruth/docling_v2/2203.01017v2.json
vendored
19904
tests/data/groundtruth/docling_v2/2203.01017v2.json
vendored
File diff suppressed because it is too large
Load Diff
149
tests/data/groundtruth/docling_v2/2203.01017v2.md
vendored
149
tests/data/groundtruth/docling_v2/2203.01017v2.md
vendored
@@ -14,14 +14,128 @@ The occurrence of tables in documents is ubiquitous. They often summarise quanti
|
||||
|
||||
## a. Picture of a table:
|
||||
|
||||
| | | 1 | Observer 1 | Observer 1 | Total observer 2 |
|
||||
|-------|--------|-----------|--------------|--------------|---------------------|
|
||||
| 3 | benign | malignant | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | <sup> | <sup> | <sup> | <sup> | <sup> |
|
||||
| <sup> | | | | | |
|
||||
|
||||
7
|
||||
|
||||
| 0 | 1 2 1 | 1 2 1 | 1 2 1 |
|
||||
|------|---------|---------|---------|
|
||||
| 3 4 | 5 | 6 | 7 |
|
||||
| 9 13 | 10 | 11 | 12 |
|
||||
| 8 2 | 14 | 15 | 16 |
|
||||
| 17 | 18 | 19 | 20 |
|
||||
| 0 | 1 | 2 | 1 | |
|
||||
|-----|-----|-----|-----|----|
|
||||
| 3 | 4 | 5 | 6 | 7 |
|
||||
| 8 | 9 | 10 | 11 | 12 |
|
||||
| 2 | 13 | 14 | 15 | 16 |
|
||||
| 17 | 18 | 19 | 20 | |
|
||||
|
||||
Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.
|
||||
|
||||
@@ -92,14 +206,14 @@ In this regard, we have prepared four synthetic datasets, each one containing 15
|
||||
|
||||
Table 1: Both 'Combined-Tabnet' and 'CombinedTabnet' are variations of the following: (*) The CombinedTabnet dataset is the processed combination of PubTabNet and Fintabnet. (**) The combined dataset is the processed combination of PubTabNet, Fintabnet and TableBank.
|
||||
|
||||
| | Tags | Bbox | Size | Format |
|
||||
| | Tags | Bbox | Size | Format |
|
||||
|--------------------|--------|--------|--------|----------|
|
||||
| PubTabNet | 3 | 3 | 509k | PNG |
|
||||
| FinTabNet | 3 | 3 | 112k | PDF |
|
||||
| TableBank | 3 | 7 | 145k | JPEG |
|
||||
| Combined-Tabnet(*) | 3 | 3 | 400k | PNG |
|
||||
| Combined(**) | 3 | 3 | 500k | PNG |
|
||||
| SynthTabNet | 3 | 3 | 600k | PNG |
|
||||
| PubTabNet | ✓ | ✓ | 509k | PNG |
|
||||
| FinTabNet | ✓ | ✓ | 112k | PDF |
|
||||
| TableBank | ✓ | ✗ | 145k | JPEG |
|
||||
| Combined-Tabnet(*) | ✓ | ✓ | 400k | PNG |
|
||||
| Combined(**) | ✓ | ✓ | 500k | PNG |
|
||||
| SynthTabNet | ✓ | ✓ | 600k | PNG |
|
||||
|
||||
one adopts a colorful appearance with high contrast and the last one contains tables with sparse content. Lastly, we have combined all synthetic datasets into one big unified synthetic dataset of 600k examples.
|
||||
|
||||
@@ -364,6 +478,15 @@ Figure 9: Example of a table with big empty distance between cells.
|
||||
|
||||
<!-- image -->
|
||||
|
||||
| | | | | | |
|
||||
|-----------|---------|-------|-------------|-------------|-------------|
|
||||
| | | ANOVA | ANOVA | ANOVA | ANOVA |
|
||||
| 1 | Sum Sq | Df | F Value | Pr (>F) | |
|
||||
| P | 5745.2 | 1 | 266.75 | 4.64 × 10−9 | 2.76 × 10−6 |
|
||||
| 2 | 2191.39 | 2 | 30.87 | 1.07 × 10−6 | 3.07 × 10−6 |
|
||||
| P | 2 | 61.48 | 1.07 × 10−6 | 2 | |
|
||||
| Residuals | 236.91 | 3 | 3 | 2 | 2 |
|
||||
|
||||
Figure 10: Example of a complex table with empty cells.
|
||||
|
||||
<!-- image -->
|
||||
|
||||
11001
tests/data/groundtruth/docling_v2/2206.01062.json
vendored
11001
tests/data/groundtruth/docling_v2/2206.01062.json
vendored
File diff suppressed because it is too large
Load Diff
107
tests/data/groundtruth/docling_v2/2206.01062.md
vendored
107
tests/data/groundtruth/docling_v2/2206.01062.md
vendored
@@ -105,21 +105,20 @@ The annotation campaign was carried out in four phases. In phase one, we identif
|
||||
|
||||
Table 1: DocLayNet dataset overview. Along with the frequency of each class label, we present the relative occurrence (as % of row 'Total') in the train, test and validation sets. The inter-annotator agreement is computed as the mAP@0.5-0.95 metric between pairwise annotations from the triple-annotated pages, from which we obtain accuracy ranges.
|
||||
|
||||
| | | % of Total | % of Total | % of Total | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) | triple inter-annotator mAP @0.5-0.95 (%) |
|
||||
|----------------|---------|--------------|--------------|--------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|--------------------------------------------|
|
||||
| class label | Count | Train | Test | Val | All | Fin | Man | Sci | Law | Pat | Ten |
|
||||
| Caption | 22524 | 2.04 | 1.77 | 2.32 | 84-89 | 40-61 | 86-92 | 94-99 | 95-99 | 69-78 | n/a |
|
||||
| Footnote | 6318 | 0.60 | 0.31 | 0.58 | 83-91 | n/a | 100 | 62-88 | 85-94 | n/a | 82-97 |
|
||||
| Formula | 25027 | 2.25 | 1.90 | 2.96 | 83-85 | n/a | n/a | 84-87 | 86-96 | n/a | n/a |
|
||||
| List-item | 185660 | 17.19 | 13.34 | 15.82 | 87-88 | 74-83 | 90-92 | 97-97 | 81-85 | 75-88 | 93-95 |
|
||||
| Page-footer | 70878 | 6.51 | 5.58 | 6.00 | 93-94 | 88-90 | 95-96 | 100 | 92-97 | 100 | 96-98 |
|
||||
| Page-header | 58022 | 5.10 | 6.70 | 5.06 | 85-89 | 66-76 | 90-94 | 98-100 | 91-92 | 97-99 | 81-86 |
|
||||
| Picture | 45976 | 4.21 | 2.78 | 5.31 | 69-71 | 56-59 | 82-86 | 69-82 | 80-95 | 66-71 | 59-76 |
|
||||
| Section-header | 142884 | 12.60 | 15.77 | 12.85 | 83-84 | 76-81 | 90-92 | 94-95 | 87-94 | 69-73 | 78-86 |
|
||||
| Table | 34733 | 3.20 | 2.27 | 3.60 | 77-81 | 75-80 | 83-86 | 98-99 | 58-80 | 79-84 | 70-85 |
|
||||
| Text | 510377 | 45.82 | 49.28 | 45.00 | 84-86 | 81-86 | 88-93 | 89-93 | 87-92 | 71-79 | 87-95 |
|
||||
| Title | 5071 | 0.47 | 0.30 | 0.50 | 60-72 | 24-63 | 50-63 | 94-100 | 82-96 | 68-79 | 24-56 |
|
||||
| Total | 1107470 | 941123 | 99816 | 66531 | 82-83 | 71-74 | 79-81 | 89-94 | 86-91 | 71-76 | 68-85 |
|
||||
| class label | Count | % of Total | % of Total | % of Total | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) | triple inter-annotator mAP @ 0.5-0.95 (%) |
|
||||
|----------------|---------|--------------|--------------|--------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|
|
||||
| class label | Caption | 22524 | 2.04 | 1.77 | 2.32 | 84-89 | 40-61 | 86-92 | 94-99 | 95-99 | 69-78 | n/a |
|
||||
| Footnote | 6318 | 0.6 | 0.31 | 0.58 | 83-91 | n/a | 100 | 62-88 | 85-94 | n/a | 82-97 | |
|
||||
| Formula | 25027 | 2.25 | 1.9 | 2.96 | 83-85 | n/a | n/a | 84-87 | 86-96 | n/a | n/a | |
|
||||
| List-item | 185660 | 17.19 | 13.34 | 15.82 | 87-88 | 74-83 | 90-92 | 97-97 | 81-85 | 75-88 | 93-95 | |
|
||||
| Page-footer | 70878 | 6.51 | 5.58 | 6 | 93-94 | 88-90 | 95-96 | 100 | 92-97 | 100 | 96-98 | |
|
||||
| Page-header | 58022 | 5.1 | 6.7 | 5.06 | 85-89 | 66-76 | 90-94 | 98-100 | 91-92 | 97-99 | 81-86 | |
|
||||
| Picture | 45976 | 4.21 | 2.78 | 5.31 | 69-71 | 56-59 | 82-86 | 69-82 | 80-95 | 66-71 | 59-76 | |
|
||||
| Section-header | 142884 | 12.6 | 15.77 | 12.85 | 83-84 | 76-81 | 90-92 | 94-95 | 87-94 | 69-73 | 78-86 | |
|
||||
| Table | 34733 | 3.2 | 2.27 | 3.6 | 77-81 | 75-80 | 83-86 | 98-99 | 58-80 | 79-84 | 70-85 | |
|
||||
| Text | 510377 | 45.82 | 49.28 | 45 | 84-86 | 81-86 | 88-93 | 89-93 | 87-92 | 71-79 | 87-95 | |
|
||||
| Title | 5071 | 0.47 | 0.3 | 0.5 | 60-72 | 24-63 | 50-63 | 94-100 | 82-96 | 68-79 | 24-56 | |
|
||||
| Total | 1107470 | 941123 | 99816 | 66531 | 82-83 | 71-74 | 79-81 | 89-94 | 86-91 | 71-76 | 68-85 | |
|
||||
|
||||
Figure 3: Corpus Conversion Service annotation user interface. The PDF page is shown in the background, with overlaid text-cells (in darker shades). The annotation boxes can be drawn by dragging a rectangle over each segment with the respective label from the palette on the right.
|
||||
|
||||
@@ -164,21 +163,20 @@ Phase 4: Production annotation. The previously selected 80K pages were annotated
|
||||
|
||||
Table 2: Prediction performance (mAP@0.5-0.95) of object detection networks on DocLayNet test set. The MRCNN (Mask R-CNN) and FRCNN (Faster R-CNN) models with ResNet-50 or ResNet-101 backbone were trained based on the network architectures from the detectron2 model zoo (Mask R-CNN R50, R101-FPN 3x, Faster R-CNN R101-FPN 3x), with default configurations. The YOLO implementation utilized was YOLOv5x6 [13]. All models were initialised using pre-trained weights from the COCO 2017 dataset.
|
||||
|
||||
| | human | MRCNN | MRCNN | FRCNN | YOLO |
|
||||
| | human | MRCNN | MRCNN | FRCNN | YOLO |
|
||||
|----------------|---------|---------|---------|---------|--------|
|
||||
| | | R50 | R101 | R101 | v5x6 |
|
||||
| Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 |
|
||||
| Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 |
|
||||
| Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 |
|
||||
| List-item | 87-88 | 81.2 | 80.8 | 81.0 | 86.2 |
|
||||
| Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 |
|
||||
| Page-header | 85-89 | 71.9 | 70.0 | 72.0 | 67.9 |
|
||||
| Picture | 69-71 | 71.7 | 72.7 | 72.0 | 77.1 |
|
||||
| Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 |
|
||||
| Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 |
|
||||
| Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 |
|
||||
| Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 |
|
||||
| All | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 |
|
||||
| Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 |
|
||||
| Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 |
|
||||
| Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 |
|
||||
| List-item | 87-88 | 81.2 | 80.8 | 81 | 86.2 |
|
||||
| Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 |
|
||||
| Page-header | 85-89 | 71.9 | 70 | 72 | 67.9 |
|
||||
| Picture | 69-71 | 71.7 | 72.7 | 72 | 77.1 |
|
||||
| Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 |
|
||||
| Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 |
|
||||
| Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 |
|
||||
| Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 |
|
||||
| All | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 |
|
||||
|
||||
to avoid this at any cost in order to have clear, unbiased baseline numbers for human document-layout annotation. Third, we introduced the feature of snapping boxes around text segments to obtain a pixel-accurate annotation and again reduce time and effort. The CCS annotation tool automatically shrinks every user-drawn box to the minimum bounding-box around the enclosed text-cells for all purely text-based segments, which excludes only Table and Picture . For the latter, we instructed annotation staff to minimise inclusion of surrounding whitespace while including all graphical lines. A downside of snapping boxes to enclosed text cells is that some wrongly parsed PDF pages cannot be annotated correctly and need to be skipped. Fourth, we established a way to flag pages as rejected for cases where no valid annotation according to the label guidelines could be achieved. Example cases for this would be PDF pages that render incorrectly or contain layouts that are impossible to capture with non-overlapping rectangles. Such rejected pages are not contained in the final dataset. With all these measures in place, experienced annotation staff managed to annotate a single page in a typical timeframe of 20s to 60s, depending on its complexity.
|
||||
|
||||
@@ -225,21 +223,20 @@ The choice and number of labels can have a significant effect on the overall mod
|
||||
|
||||
Table 4: Performance of a Mask R-CNN R50 network with document-wise and page-wise split for different label sets. Naive page-wise split will result in /tildelow 10% point improvement.
|
||||
|
||||
| Class-count Split | 11 | 11 | 5 | 5 |
|
||||
|---------------------|------|------|-----|------|
|
||||
| | Doc | Page | Doc | Page |
|
||||
| Caption | 68 | 83 | | |
|
||||
| Footnote | 71 | 84 | | |
|
||||
| Formula | 60 | 66 | | |
|
||||
| List-item | 81 | 88 | 82 | 88 |
|
||||
| Page-footer | 62 | 89 | | |
|
||||
| Page-header | 72 | 90 | | |
|
||||
| Picture | 72 | 82 | 72 | 82 |
|
||||
| Section-header | 68 | 83 | 69 | 83 |
|
||||
| Table | 82 | 89 | 82 | 90 |
|
||||
| Text | 85 | 91 | 84 | 90 |
|
||||
| Title | 77 | 81 | | |
|
||||
| All | 72 | 84 | 78 | 87 |
|
||||
| Class-count Split | 11Doc Page | 5Doc Page | Page |
|
||||
|---------------------|--------------|-------------|--------|
|
||||
| Caption | 68 | 83 | |
|
||||
| Footnote | 71 | 84 | |
|
||||
| Formula | 60 | 66 | |
|
||||
| List-item | 81 | 88 | 82 |
|
||||
| Page-footer | 62 | 89 | |
|
||||
| Page-header | 72 | 90 | |
|
||||
| Picture | 72 | 82 | 82 |
|
||||
| Section-header | 68 | 83 | 83 |
|
||||
| Table | 82 | 89 | 82 |
|
||||
| Text | 85 | 91 | 84 |
|
||||
| Title | 77 | 81 | |
|
||||
| All | 72 | 84 | 78 |
|
||||
|
||||
lists in PubLayNet (grouped list-items) versus DocLayNet (separate list-items), the label set of size 4 is the closest to PubLayNet, in the assumption that the List is down-mapped to Text in PubLayNet. The results in Table 3 show that the prediction accuracy on the remaining class labels does not change significantly when other classes are merged into them. The overall macro-average improves by around 5%, in particular when Page-footer and Page-header are excluded.
|
||||
|
||||
@@ -255,22 +252,12 @@ KDD '22, August 14-18, 2022, Washington, DC, USA Birgit Pfitzmann, Christoph Aue
|
||||
|
||||
Table 5: Prediction Performance (mAP@0.5-0.95) of a Mask R-CNN R50 network across the PubLayNet, DocBank & DocLayNet data-sets. By evaluating on common label classes of each dataset, we observe that the DocLayNet-trained model has much less pronounced variations in performance across all datasets.
|
||||
|
||||
| | | Testing on | Testing on | Testing on |
|
||||
|-----------------|------------|--------------|--------------|--------------|
|
||||
| Training on | labels | PLN | DB | DLN |
|
||||
| PubLayNet (PLN) | Figure | 96 | 43 | 23 |
|
||||
| PubLayNet (PLN) | Sec-header | 87 | - | 32 |
|
||||
| | Table | 95 | 24 | 49 |
|
||||
| | Text | 96 | - | 42 |
|
||||
| | total | 93 | 34 | 30 |
|
||||
| DocBank (DB) | Figure | 77 | 71 | 31 |
|
||||
| DocBank (DB) | Table | 19 | 65 | 22 |
|
||||
| DocBank (DB) | total | 48 | 68 | 27 |
|
||||
| DocLayNet (DLN) | Figure | 67 | 51 | 72 |
|
||||
| DocLayNet (DLN) | Sec-header | 53 | - | 68 |
|
||||
| | Table | 87 | 43 | 82 |
|
||||
| | Text | 77 | - | 84 |
|
||||
| | total | 59 | 47 | 78 |
|
||||
| Training on | labels | Testing on PLN | DB | DLN |
|
||||
|-----------------|------------------------------------|------------------|-------------------|----------------|
|
||||
| PubLayNet (PLN) | Figure Sec-header Table Text total | 96 87 95 96 | 43 - 32 | 23 49 24 42 |
|
||||
| DocBank (DB) | Figure Table total | 77 19 48 | 71 65 68 27 | 31 22 27 72 |
|
||||
| DocLayNet (DLN) | Figure Sec-header Table Text total | 67 53 87 77 | 51 - 68 43 84 | 72 78 82 84 |
|
||||
| DocLayNet (DLN) | total | 59 | 47 | 78 |
|
||||
|
||||
Section-header , Table and Text . Before training, we either mapped or excluded DocLayNet's other labels as specified in table 3, and also PubLayNet's List to Text . Note that the different clustering of lists (by list-element vs. whole list objects) naturally decreases the mAP score for Text .
|
||||
|
||||
|
||||
1091
tests/data/groundtruth/docling_v2/2305.03393v1-pg9.json
vendored
1091
tests/data/groundtruth/docling_v2/2305.03393v1-pg9.json
vendored
File diff suppressed because it is too large
Load Diff
@@ -6,13 +6,12 @@ We have chosen the PubTabNet data set to perform HPO, since it includes a highly
|
||||
|
||||
Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart.
|
||||
|
||||
| # enc-layers | # dec-layers | Language | TEDs | TEDs | TEDs | mAP (0.75) | Inference time (secs) |
|
||||
|----------------|----------------|------------|-------------|-------------|-------------|--------------|-------------------------|
|
||||
| # enc-layers | # dec-layers | Language | simple | complex | all | mAP (0.75) | Inference time (secs) |
|
||||
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 0.909 | 0.927 0.938 | 0.853 0.843 | 1.97 3.77 |
|
||||
| 2 | 4 | OTSL HTML | 0.923 0.945 | 0.897 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
|
||||
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
|
||||
| #enc-layers | #dec-layers | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inferencetime (secs) |
|
||||
|---------------|---------------|------------|-------------|-------------|-------------|-------------|------------------------|
|
||||
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 0.909 | 0.927 0.938 | 0.853 0.843 | 1.97 3.77 |
|
||||
| 2 | 4 | OTSL HTML | 0.923 0.945 | 0.897 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
|
||||
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
|
||||
|
||||
## 5.2 Quantitative Results
|
||||
|
||||
|
||||
1589
tests/data/groundtruth/docling_v2/2305.03393v1.json
vendored
1589
tests/data/groundtruth/docling_v2/2305.03393v1.json
vendored
File diff suppressed because it is too large
Load Diff
@@ -126,13 +126,12 @@ We have chosen the PubTabNet data set to perform HPO, since it includes a highly
|
||||
|
||||
Table 1. HPO performed in OTSL and HTML representation on the same transformer-based TableFormer [9] architecture, trained only on PubTabNet [22]. Effects of reducing the # of layers in encoder and decoder stages of the model show that smaller models trained on OTSL perform better, especially in recognizing complex table structures, and maintain a much higher mAP score than the HTML counterpart.
|
||||
|
||||
| # enc-layers | # dec-layers | Language | TEDs | TEDs | TEDs | mAP (0.75) | Inference time (secs) |
|
||||
|----------------|----------------|------------|-------------|-------------|-------------|--------------|-------------------------|
|
||||
| # enc-layers | # dec-layers | Language | simple | complex | all | mAP (0.75) | Inference time (secs) |
|
||||
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 0.909 | 0.927 0.938 | 0.853 0.843 | 1.97 3.77 |
|
||||
| 2 | 4 | OTSL HTML | 0.923 0.945 | 0.897 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
|
||||
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
|
||||
| #enc-layers | #dec-layers | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inferencetime (secs) |
|
||||
|---------------|---------------|------------|-------------|-------------|-------------|-------------|------------------------|
|
||||
| 6 | 6 | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| 4 | 4 | OTSL HTML | 0.938 0.952 | 0.904 0.909 | 0.927 0.938 | 0.853 0.843 | 1.97 3.77 |
|
||||
| 2 | 4 | OTSL HTML | 0.923 0.945 | 0.897 0.901 | 0.915 0.931 | 0.859 0.834 | 1.91 3.81 |
|
||||
| 4 | 2 | OTSL HTML | 0.952 0.944 | 0.92 0.903 | 0.942 0.931 | 0.857 0.824 | 1.22 2 |
|
||||
|
||||
## 5.2 Quantitative Results
|
||||
|
||||
@@ -142,12 +141,12 @@ Additionally, the results show that OTSL has an advantage over HTML when applied
|
||||
|
||||
Table 2. TSR and cell detection results compared between OTSL and HTML on the PubTabNet [22], FinTabNet [21] and PubTables-1M [14] data sets using TableFormer [9] (with enc=6, dec=6, heads=8).
|
||||
|
||||
| Data set | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inference time (secs) |
|
||||
|--------------|------------|-------------|-------------|-------------|-------------|-------------------------|
|
||||
| Data set | Language | simple | complex | all | mAP(0.75) | Inference time (secs) |
|
||||
| PubTabNet | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| FinTabNet | OTSL HTML | 0.955 0.917 | 0.961 0.922 | 0.959 0.92 | 0.862 0.722 | 1.85 3.26 |
|
||||
| PubTables-1M | OTSL HTML | 0.987 0.983 | 0.964 0.944 | 0.977 0.966 | 0.896 0.889 | 1.79 3.26 |
|
||||
| Data set | Language | TEDs | TEDs | TEDs | mAP(0.75) | Inferencetime (secs) |
|
||||
|--------------|------------|-------------|-------------|-------------|-------------|------------------------|
|
||||
| Data set | Language | simple | complex | all | mAP(0.75) | Inferencetime (secs) |
|
||||
| PubTabNet | OTSL HTML | 0.965 0.969 | 0.934 0.927 | 0.955 0.955 | 0.88 0.857 | 2.73 5.39 |
|
||||
| FinTabNet | OTSL HTML | 0.955 0.917 | 0.961 0.922 | 0.959 0.92 | 0.862 0.722 | 1.85 3.26 |
|
||||
| PubTables-1M | OTSL HTML | 0.987 0.983 | 0.964 0.944 | 0.977 0.966 | 0.896 0.889 | 1.79 3.26 |
|
||||
|
||||
## 5.3 Qualitative Results
|
||||
|
||||
|
||||
4674
tests/data/groundtruth/docling_v2/redp5110_sampled.json
vendored
4674
tests/data/groundtruth/docling_v2/redp5110_sampled.json
vendored
File diff suppressed because it is too large
Load Diff
@@ -8,49 +8,50 @@ Front cover
|
||||
|
||||
## Contents
|
||||
|
||||
| Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . vii |
|
||||
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
|
||||
| Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | viii |
|
||||
| DB2 for i Center of Excellence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . ix |
|
||||
| Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . xi |
|
||||
| Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . xi |
|
||||
| Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | xiii |
|
||||
| Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | xiii |
|
||||
| Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | xiv |
|
||||
| Chapter 1. Securing and protecting IBM DB2 data . . . . . . . . . . . . . . . . . . . . . . . | . 1 |
|
||||
| 1.1 Security fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . 2 |
|
||||
| 1.2 Current state of IBM i security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . 2 |
|
||||
| 1.3 DB2 for i security controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . 3 |
|
||||
| 1.3.1 Existing row and column control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . 4 |
|
||||
| 1.3.2 New controls: Row and Column Access Control. . . . . . . . . . . . . . . . . . . . . | . 5 |
|
||||
| Chapter 2. Roles and separation of duties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . 7 |
|
||||
| 2.1 Roles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | . 8 |
|
||||
| 2.1.1 DDM and DRDA application server access: QIBM_DB_DDMDRDA . . . . . | . 8 |
|
||||
| 2.1.2 Toolbox application server access: QIBM_DB_ZDA. . . . . . . . . . . . . . . . . . | . 8 |
|
||||
| 2.1.3 Database Administrator function: QIBM_DB_SQLADM . . . . . . . . . . . . . . . | . 9 |
|
||||
| 2.1.4 Database Information function: QIBM_DB_SYSMON . . . . . . . . . . . . . . . . | . 9 |
|
||||
| 2.1.5 Security Administrator function: QIBM_DB_SECADM . . . . . . . . . . . . . . . . | . 9 |
|
||||
| 2.1.6 Change Function Usage CL command. . . . . . . . . . . . . . . . . . . . . . . . . . . . | 10 |
|
||||
| 2.1.7 Verifying function usage IDs for RCAC with the FUNCTION_USAGE view | 10 |
|
||||
| 2.2 Separation of duties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 10 |
|
||||
| Chapter 3. Row and Column Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 13 |
|
||||
| 3.1 Explanation of RCAC and the concept of access control . . . . . . . . . . . . . . . . . . | 14 |
|
||||
| 3.1.1 Row permission and column mask definitions . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Enabling and activating RCAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14 |
|
||||
| 3.2 Special registers and built-in global variables . . . . . . . . . . . . . . . . . . . . . . . . . . . | 16 18 |
|
||||
| 3.2.1 Special registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 18 |
|
||||
| 3.2.2 Built-in global variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 19 |
|
||||
| 3.3 VERIFY_GROUP_FOR_USER function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 20 |
|
||||
| 3.4 Establishing and controlling accessibility by using the RCAC rule text. . . . . . . . | 21 |
|
||||
| 3.5 SELECT, INSERT, and UPDATE behavior with RCAC . . . . . . . . . . . . . . . . . . . | 22 |
|
||||
| Human resources example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 22 |
|
||||
| 3.6 3.6.1 Assigning the QIBM_DB_SECADM function ID to the consultants. . . . . . . | 23 |
|
||||
| 3.6.2 Creating group profiles for the users and their roles. . . . . . . . . . . . . . . . . . | 23 |
|
||||
| 3.6.3 Demonstrating data access without RCAC. . . . . . . . . . . . . . . . . . . . . . . . . | 24 |
|
||||
| 3.6.4 Defining and creating row permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 25 |
|
||||
| masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 26 |
|
||||
| 3.6.5 Defining and creating column | 28 |
|
||||
| 3.6.6 Activating RCAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.7 Demonstrating data access with RCAC . . . . . . . . . . . . . . . . . . . . . . . . . . . | 29 |
|
||||
| 3.6.8 Demonstrating data access with a view and RCAC . . . . . . . . . . . . . . . . . . | 32 |
|
||||
| Notices | viii |
|
||||
|----------------------------------------------------------------------------|--------|
|
||||
| Trademarks | viii |
|
||||
| DB2 for i Center of Excellence | ix |
|
||||
| Preface | xii |
|
||||
| Authors | xiii |
|
||||
| Now you can become a published author, too! | xiii |
|
||||
| Comments welcome | xiii |
|
||||
| Stay connected to IBM Redbooks | xiv |
|
||||
| Chapter 1. Securing and protecting IBM DB2 data | 1 |
|
||||
| 1.1 Security fundamentals | 2 |
|
||||
| 1.2 Current state of IBM i security | 2 |
|
||||
| 1.3 DB2 for i security controls | 3 |
|
||||
| 1.3.1 Existing row and column control | 4 |
|
||||
| 1.3.2 New controls: Row and Column Access Control | 5 |
|
||||
| Chapter 2. Roles and separation of duties | 7 |
|
||||
| 2.1 Roles | 8 |
|
||||
| 2.1.1 DDM and DRDA application server access: QIBM_DB_DDMDRDA | 8 |
|
||||
| 2.1.2 Toolbox application server access: QIBM_DB_ZDA. | 8 |
|
||||
| 2.1.3 Database Administrator function: QIBM_DB_SQLADM | 9 |
|
||||
| 2.1.4 Database Information function: QIBM_DB_SYSMON | 9 |
|
||||
| 2.1.5 Security Administrator function: QIBM_DB_SECADM | 9 |
|
||||
| 2.1.6 Change Function Usage CL command | 10 |
|
||||
| 2.1.7 Verifying function usage IDs for RCAC with the FUNCTION_USAGE view | 10 |
|
||||
| 2.2 Separation of duties | 10 |
|
||||
| Chapter 3. Row and Column Access Control | 13 |
|
||||
| 3.1 Explanation of RCAC and the concept of access control | 14 |
|
||||
| 3.1.1 Row permission and column mask definitions | 14 |
|
||||
| 3.1.2 Enabling and activating RCAC | 16 |
|
||||
| 3.2 Special registers and built-in global variables | 18 |
|
||||
| 3.2.1 Special registers | 18 |
|
||||
| 3.2.2 Built-in global variables | 19 |
|
||||
| 3.3 VERIFY_GROUP_FOR_USER function | 20 |
|
||||
| 3.4 Establishing and controlling accessibility by using the RCAC rule text | 21 |
|
||||
| 3.5 SELECT, INSERT, and UPDATE behavior with RCAC | 22 |
|
||||
| 3.6 Human resources example | 22 |
|
||||
| 3.6.1 Assigning the QIBM_DB_SECADM function ID to the consultants | 23 |
|
||||
| 3.6.2 Creating group profiles for the users and their roles | 23 |
|
||||
| 3.6.3 Demonstrating data access without RCAC | 24 |
|
||||
| 3.6.4 Defining and creating row permissions | 25 |
|
||||
| 3.6.5 Defining and creating column masks | 26 |
|
||||
| 3.6.6 Activating RCAC | 28 |
|
||||
| 3.7 Demonstrating data access with RCAC | 29 |
|
||||
| 3.8 Demonstrating data access with a view and RCAC | 32 |
|
||||
|
||||
DB2 for i Center of Excellence
|
||||
|
||||
@@ -189,21 +190,22 @@ The FUNCTION\_USAGE view contains function usage configuration details. Table 2-
|
||||
|
||||
Table 2-1 FUNCTION\_USAGE view
|
||||
|
||||
| Column name | Data type | Description |
|
||||
|---------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| FUNCTION_ID | VARCHAR(30) | ID of the function. |
|
||||
| USER_NAME | VARCHAR(10) | Name of the user profile that has a usage setting for this function. |
|
||||
| USAGE | VARCHAR(7) | Usage setting: /SM590000 ALLOWED: The user profile is allowed to use the function. /SM590000 DENIED: The user profile is not allowed to use the function. |
|
||||
| USER_TYPE | VARCHAR(5) | Type of user profile: /SM590000 USER: The user profile is a user. /SM590000 GROUP: The user profile is a group. |
|
||||
| Column name | Data type | Description |
|
||||
|---------------|-------------|----------------------------------------------------------------------|
|
||||
| FUNCTION_ID | VARCHAR(30) | ID of the function. |
|
||||
| USER_NAME | VARCHAR(10) | Name of the user profile that has a usage setting for this function. |
|
||||
| USAGE | VARCHAR(7) | Usage setting: |
|
||||
| USER_TYPE | VARCHAR(5) | Type of user profile: |
|
||||
|
||||
To discover who has authorization to define and manage RCAC, you can use the query that is shown in Example 2-1.
|
||||
|
||||
Example 2-1 Query to determine who has authority to define and manage RCAC
|
||||
|
||||
| SELECT | function_id, user_name, usage, user_type |
|
||||
|------------|--------------------------------------------------------|
|
||||
| FROM ORDER | function_usage function_id='QIBM_DB_SECADM' user_name; |
|
||||
| WHERE | |
|
||||
| SELECT | function_id, user_name, usage, user_type |
|
||||
|----------|--------------------------------------------|
|
||||
| FROM | function_usage |
|
||||
| WHERE | function_id='QIBM_DB_SECADM' |
|
||||
| ORDER BY | user_name; |
|
||||
|
||||
## 2.2 Separation of duties
|
||||
|
||||
@@ -223,20 +225,19 @@ Table 2-2 shows a comparison of the different function usage IDs and *JOBCTL aut
|
||||
|
||||
Table 2-2 Comparison of the different function usage IDs and *JOBCTL authority
|
||||
|
||||
| User action | *JOBCTL | QIBM_DB_SECADM | QIBM_DB_SQLADM | QIBM_DB_SYSMON | No Authority |
|
||||
|-----------------------------------------------------------------------------|-----------|------------------|------------------|------------------|----------------|
|
||||
| SET CURRENT DEGREE (SQL statement) | X | | X | | |
|
||||
| CHGQRYA command targeting a different user's job | X | | X | | |
|
||||
| STRDBMON or ENDDBMON commands targeting a different user's job | X | | X | | |
|
||||
| STRDBMON or ENDDBMON commands targeting a job that matches the current user | X | | X | X | X |
|
||||
| QUSRJOBI() API format 900 or System i Navigator's SQL Details for Job | X | | X | X | |
|
||||
| Visual Explain within Run SQL scripts | X | | X | X | X |
|
||||
| Visual Explain outside of Run SQL scripts | X | | X | | |
|
||||
| ANALYZE PLAN CACHE procedure | X | | X | | |
|
||||
| DUMP PLAN CACHE procedure | X | | X | | |
|
||||
| MODIFY PLAN CACHE procedure | X | | X | | |
|
||||
| MODIFY PLAN CACHE PROPERTIES procedure (currently does not check authority) | X | | X | | |
|
||||
| CHANGE PLAN CACHE SIZE procedure (currently does not check authority) | X | | X | | |
|
||||
| User action | SET CURRENT DEGREE (SQL statement) | X | QIBCMDB_SECADM | QIBCMDB_SQLADM | QIBCMDB_SYSMON | No Authority |
|
||||
|-----------------------------------------------------------------------------|--------------------------------------|-----|------------------|------------------|------------------|----------------|
|
||||
| CHGQRYA command targeting a different user's job | X | | X | | | |
|
||||
| STRDBMON or ENDBDMON commands targeting a different user's job | X | | X | | | |
|
||||
| STRDBMON or ENDBDMON commands targeting a job that matches the current user | X | | X | X | X | X |
|
||||
| QUSRJOBI() API format 900 or System i Navigator's SQL Details for Job | X | | X | X | | |
|
||||
| Visual Explain within Run SQL scripts | X | | X | X | X | X |
|
||||
| Visual Explain outside of Run SQL scripts | X | | X | | | |
|
||||
| ANALYZE PLAN CACHE procedure | X | | X | | | |
|
||||
| DUMP PLAN CACHE procedure | X | | X | | | |
|
||||
| MODIFY PLAN CACHE procedure | X | | X | | | |
|
||||
| MODIFY PLAN CACHE PROPERTIES procedure (currently does not check authority) | X | | X | | | |
|
||||
| CHANGE PLAN CACHE SIZE procedure (currently does not check authority) | X | | X | | | |
|
||||
|
||||
The SQL CREATE PERMISSION statement that is shown in Figure 3-1 is used to define and initially enable or disable the row access rules.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user