feat: Updated Layout processing with forms and key-value areas (#530)

* Upgraded Layout Postprocessing, sending old code back to ERZ

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Implement hierachical cluster layout processing

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pass nested cluster processing through full pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pass nested clusters through GLM as payload

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Move to_docling_document from ds-glm to this repo

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Clean up imports again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI.
- Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run.
- Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting.
- Refactor the way how the docling-ibm-models are called to match the new init signature of models.
- Translate the accelerator options to the specific inputs for third-party models.
- Extend the docling CLI with parameters to set the num_threads and device.
- Add new unit tests.
- Write new example how to use the accelerator options.

* fix: Improve the pydantic objects in the pipeline_options and imports.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Updated test ground-truth

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated test ground-truth (again), bugfix for empty layout

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Do proper check to set the device in EasyOCR, RapidOCR.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Correct the way to set GPU for EasyOCR, RapidOCR

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix: Ocr AccleratorDevice

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* Merge pull request #556 from DS4SD/cau/layout-processing-improvement

feat: layout processing improvements and bugfixes

* Update lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update HF model ref, reset test generate

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Repin to release package versions

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Many layout processing improvements, add document index type

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update pinnings to docling-core

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update test GT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix table box snapping

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes for cluster pre-ordering

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Introduce OCR confidence, propagate to orphan in post-processing

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix form and key value area groups

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Adjust confidence in EasyOcr

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Roll back CLI changes from main

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update test GT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update docling-core pinning

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Annoying fixes for historical python versions

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated test GT for legacy

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Comment cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
This commit is contained in:
Christoph Auer
2024-12-17 17:32:24 +01:00
committed by GitHub
parent 00dec7a2f3
commit 60dc852f16
56 changed files with 1659 additions and 1718 deletions

View File

@@ -20,28 +20,18 @@ Accurate document layout analysis is a key requirement for highquality PDF docum
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
KDD '22, August 14-18, 2022, Washington, DC, USA © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9385-0/22/08. https://doi.org/10.1145/3534678.3539043
KDD '22, August 14-18, 2022, Washington, DC, USA
13 USING THE VERTICAL TUBE MODELS AY11230/11234 1. The vertical tube can be used for instructional viewing or to photograph the image with a digital camera or a micro TV unit 2. Loosen the retention screw, then rotate the adjustment ring to change the length of the vertical tube. 3. Make sure that both the images in OPERATION ( cont. ) SELECTING OBJECTIVE MAGNIFICATION 1. There are two objectives. The lower magnification objective has a greater depth of field and view. 2. In order to observe the specimen easily use the lower magnification objective first. Then, by rotating the case, the magnification can be changed. CHANGING THE INTERPUPILLARY DISTANCE 1. The distance between the observer's pupils is the interpupillary distance. 2. To adjust the interpupillary distance rotate the prism caps until both eyes coincide with the image in the eyepiece. FOCUSING 1. Remove the lens protective cover. 2. Place the specimen on the working stage. 3. Focus the specimen with the left eye first while turning the focus knob until the image appears clear and sharp. 4. Rotate the right eyepiece ring until the images in each eyepiece coincide and are sharp and clear. CHANGING THE BULB 1. Disconnect the power cord. 2. When the bulb is cool, remove the oblique illuminator cap and remove the halogen bulb with cap. 3. Replace with a new halogen bulb. 4. Open the window in the base plate and replace the halogen lamp or fluorescent lamp of transmitted illuminator. FOCUSING 1. Turn the focusing knob away or toward you until a clear image is viewed. 2. If the image is unclear, adjust the height of the elevator up or down, then turn the focusing knob again. ZOOM MAGNIFICATION 1. Turn the zoom magnification knob to the desired magnification and field of view. 2. In most situations, it is recommended that you focus at the lowest magnification, then move to a higher magnification and re-focus as necessary. 3. If the image is not clear to both eyes at the same time, the diopter ring may need adjustment. DIOPTER RING ADJUSTMENT 1. To adjust the eyepiece for viewing with or without eyeglasses and for differences in acuity between the right and left eyes, follow the following steps: a. Observe an image through the left eyepiece and bring a specific point into focus using the focus knob. b. By turning the diopter ring adjustment for the left eyepiece, bring the same point into sharp focus. c.Then bring the same point into focus through the right eyepiece by turning the right diopter ring. d.With more than one viewer, each viewer should note their own diopter ring position for the left and right eyepieces, then before viewing set the diopter ring adjustments to that setting. CHANGING THE BULB 1. Disconnect the power cord from the electrical outlet. 2. When the bulb is cool, remove the oblique illuminator cap and remove the halogen bulb with cap. 3. Replace with a new halogen bulb. 4. Open the window in the base plate and replace the halogen lamp or fluorescent lamp of transmitted illuminator. Model AY11230 Model AY11234
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9385-0/22/08.
https://doi.org/10.1145/3534678.3539043
Figure 1: Four examples of complex page layouts across different document categories
<!-- image -->
<!-- image -->
14
<!-- image -->
Circling Minimums 7 K H U H Z D V D F K D Q J H W R W K H 7 ( 5 3 6 F U L W H U L D L Q W K D W D ႇH F W V F L U F O L Q J D U H D G L P H Q V L R Q E \ H [ S D Q G L Q J W K H D U H D V W R S U R Y L G H improved obstacle protection. To indicate that the new criteria had been applied to a given procedure, a is placed on the circling line of minimums. The new circling tables and explanatory information is located in the Legend of the TPP. 7 K H D S S U R D F K H V X V L Q J V W D Q G D U G F L U F O L Q J D S S U R D F K D U H D V F D Q E H L G H Q W L ¿ H G E \ W K H D E V H Q F H R I W K H on the circling line of minima.
$ S S O \ ( [ S D Q G H G & L U F O L Q J $ S S U R D F K 0 D Q H X Y H U L Q J $ L U V S D F H 5 D G L X V Table
$ S S O \ 6 W D Q G D U G & L U F O L Q J $ S S U R D F K 0 D Q H X Y H U L Q J 5 D G L X V 7 D E O H AIRPORT SKETCH The airport sketch is a depiction of the airport with emphasis on runway pattern and related information, positioned in either the lower left or lower right corner of the chart to aid pilot recognition of the airport from the air and to provide some information to aid on ground navigation of the airport. The runways are drawn to scale and oriented to true north. Runway dimensions (length and width) are shown for all active runways. Runway(s) are depicted based on what type and construction of the runway. Hard Surface Other Than Hard Surface Metal Surface Closed Runway Under Construction Stopways, Taxiways, Parking Areas Displaced Threshold Closed Pavement Water Runway Taxiways and aprons are shaded grey. Other runway features that may be shown are runway numbers, runway dimensions, runway slope, arresting gear, and displaced threshold. 2 W K H U L Q I R U P D W L R Q F R Q F H U Q L Q J O L J K W L Q J ¿ Q D O D S S U R D F K E H D U L Q J V D L U S R U W E H D F R Q R E V W D F O H V F R Q W U R O W R Z H U 1 $ 9 $ , ' V K H O L -pads may also be shown. $ L U S R U W ( O H Y D W L R Q D Q G 7 R X F K G R Z Q = R Q H ( O H Y D W L R Q The airport elevation is shown enclosed within a box in the upper left corner of the sketch box and the touchdown zone elevation (TDZE) is shown in the upper right corner of the sketch box. The airport elevation is the highest point of an D L U S R U W ¶ V X V D E O H U X Q Z D \ V P H D V X U H G L Q I H H W I U R P P H D Q V H D O H Y H O 7 K H 7 ' = ( L V W K H K L J K H V W H O H Y D W L R Q L Q W K H ¿ U V W I H H W R I the landing surface. Circling only approaches will not show a TDZE. FAA Chart Users' Guide - Terminal Procedures Publication (TPP) - Terms
114
## KEYWORDS
PDF document conversion, layout segmentation, object-detection, data set, Machine Learning
@@ -158,6 +148,8 @@ Figure 4: Examples of plausible annotation alternatives for the same page. Crite
<!-- image -->
05237a14f2524e3f53c8454b074409d05078038a6a36b770fcc8ec7e540deae0
were carried out over a timeframe of 12 weeks, after which 8 of the 40 initially allocated annotators did not pass the bar.
Phase 4: Production annotation. The previously selected 80K pages were annotated with the defined 11 class labels by 32 annotators. This production phase took around three months to complete. All annotations were created online through CCS, which visualises the programmatic PDF text-cells as an overlay on the page. The page annotation are obtained by drawing rectangular bounding-boxes, as shown in Figure 3. With regard to the annotation practices, we implemented a few constraints and capabilities on the tooling level. First, we only allow non-overlapping, vertically oriented, rectangular boxes. For the large majority of documents, this constraint was sufficient and it speeds up the annotation considerably in comparison with arbitrary segmentation shapes. Second, annotator staff were not able to see each other's annotations. This was enforced by design to avoid any bias in the annotation, which could skew the numbers of the inter-annotator agreement (see Table 1). We wanted