fix: restrict click version and update lock file (#1582)

* fix click dependency and update lock file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update test GT Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-12-09 13:18:24 +00:00 · 2025-05-13 10:40:08 +02:00
parent 0d0fa6cbe3
commit 8baa85a49d
18 changed files with 1322 additions and 515 deletions
--- a/tests/data/groundtruth/docling_v1/2203.01017v2.json
+++ b/tests/data/groundtruth/docling_v1/2203.01017v2.json
@@ -365,6 +365,29 @@
      "type": "figure",
      "$ref": "#/figures/2"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            308.862,
+            232.72709999999995,
+            545.11517,
+            277.49963
+          ],
+          "page": 1,
+          "span": [
+            0,
+            220
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 1: Picture of a table with subtle, complex features such as (1) multi-column headers, (2) cell with multi-row text and (3) cells with no content. Image from PubTabNet evaluation set, filename: 'PMC2944238 004 02'.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "name": "Table",
      "type": "table",
@@ -904,6 +927,29 @@
      "type": "figure",
      "$ref": "#/figures/3"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            308.862,
+            503.3020900000001,
+            545.11511,
+            524.16364
+          ],
+          "page": 3,
+          "span": [
+            0,
+            104
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 2: Distribution of the tables across different table dimensions in PubTabNet + FinTabNet datasets",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "prov": [
        {
@@ -1282,11 +1328,57 @@
      "type": "figure",
      "$ref": "#/figures/4"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            50.111992,
+            567.03308,
+            545.10846,
+            588.01422
+          ],
+          "page": 5,
+          "span": [
+            0,
+            212
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 3: TableFormer takes in an image of the PDF and creates bounding box and HTML structure predictions that are synchronized. The bounding boxes grabs the content from the PDF and inserts it in the structure.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "name": "Picture",
      "type": "figure",
      "$ref": "#/figures/5"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            50.112,
+            111.72906,
+            286.36597,
+            264.2171900000001
+          ],
+          "page": 5,
+          "span": [
+            0,
+            745
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 4: Given an input image of a table, the Encoder produces fixed-length features that represent the input image. The features are then passed to both the Structure Decoder and Cell BBox Decoder . During training, the Structure Decoder receives 'tokenized tags' of the HTML code that represent the table structure. Afterwards, a transformer encoder and decoder architecture is employed to produce features that are received by a linear layer, and the Cell BBox Decoder. The linear layer is applied to the features to predict the tags. Simultaneously, the Cell BBox Decoder selects features referring to the data cells (' < td > ', ' < ') and passes them through an attention network, an MLP, and a linear layer to predict the bounding boxes.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "prov": [
        {
@@ -2214,6 +2306,29 @@
      "type": "figure",
      "$ref": "#/figures/7"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            53.811783000000005,
+            575.89355,
+            385.93451,
+            583.76672
+          ],
+          "page": 8,
+          "span": [
+            0,
+            79
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "b. Structure predicted by TableFormer, with superimposed matched PDF cell text:",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "name": "Table",
      "type": "table",
@@ -2252,11 +2367,57 @@
      "type": "figure",
      "$ref": "#/figures/8"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            62.595001,
+            324.36508,
+            532.63049,
+            333.27164
+          ],
+          "page": 8,
+          "span": [
+            0,
+            112
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 6: An example of TableFormer predictions (bounding boxes and structure) from generated SynthTabNet table.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "name": "Picture",
      "type": "figure",
      "$ref": "#/figures/9"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            50.112,
+            426.35013,
+            545.11377,
+            471.12265
+          ],
+          "page": 8,
+          "span": [
+            0,
+            397
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 5: One of the benefits of TableFormer is that it is language agnostic, as an example, the left part of the illustration demonstrates TableFormer predictions on previously unseen language (Japanese). Additionally, we see that TableFormer is robust to variability in style and content, right side of the illustration shows the example of the TableFormer prediction from the FinTabNet dataset.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "name": "Picture",
      "type": "figure",
@@ -3707,6 +3868,29 @@
      "type": "figure",
      "$ref": "#/figures/11"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            50.112,
+            605.63605,
+            545.11371,
+            626.49762
+          ],
+          "page": 12,
+          "span": [
+            0,
+            245
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 7: Distribution of the tables across different dimensions per dataset. Simple vs complex tables per dataset and split, strict vs non strict html structures per dataset and table complexity, missing bboxes per dataset and table complexity.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "prov": [
        {
@@ -4517,6 +4701,29 @@
      "type": "figure",
      "$ref": "#/figures/16"
    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            315.79001,
+            411.40909,
+            538.18524,
+            420.31564
+          ],
+          "page": 14,
+          "span": [
+            0,
+            55
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 13: Table predictions example on colorful table.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
+    },
    {
      "name": "Table",
      "type": "table",
@@ -4675,6 +4882,29 @@
      "name": "Picture",
      "type": "figure",
      "$ref": "#/figures/23"
+    },
+    {
+      "prov": [
+        {
+          "bbox": [
+            50.112,
+            262.80108999999993,
+            545.11383,
+            283.66263
+          ],
+          "page": 16,
+          "span": [
+            0,
+            153
+          ],
+          "__ref_s3_data": null
+        }
+      ],
+      "text": "Figure 17: Example of long table. End-to-end example from initial PDF cells to prediction of bounding boxes, post processing and prediction of structure.",
+      "type": "caption",
+      "payload": null,
+      "name": "Caption",
+      "font": null
    }
  ],
  "figures": [