fix: restrict click version and update lock file (#1582)

* fix click dependency and update lock file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update test GT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2025-05-13 10:40:08 +02:00
committed by GitHub
parent 0d0fa6cbe3
commit 8baa85a49d
18 changed files with 1322 additions and 515 deletions

View File

@@ -340,6 +340,29 @@
"type": "figure",
"$ref": "#/figures/0"
},
{
"prov": [
{
"bbox": [
134.765,
591.77942,
480.59189,
665.66583
],
"page": 2,
"span": [
0,
574
],
"__ref_s3_data": null
}
],
"text": "Fig. 1. Comparison between HTML and OTSL table structure representation: (A) table-example with complex row and column headers, including a 2D empty span, (B) minimal graphical representation of table structure using rectangular layout, (C) HTML representation, (D) OTSL representation. This example demonstrates many of the key-features of OTSL, namely its reduced vocabulary size (12 versus 5 in this case), its reduced sequence length (55 versus 30) and a enhanced internal structure (variable token sequence length per row in HTML versus a fixed length of rows in OTSL).",
"type": "caption",
"payload": null,
"name": "Caption",
"font": null
},
{
"prov": [
{
@@ -644,6 +667,29 @@
"type": "figure",
"$ref": "#/figures/1"
},
{
"prov": [
{
"bbox": [
145.60701,
562.78821,
469.75223000000005,
570.92072
],
"page": 5,
"span": [
0,
73
],
"__ref_s3_data": null
}
],
"text": "Fig. 2. Frequency of tokens in HTML and OTSL as they appear in PubTabNet.",
"type": "caption",
"payload": null,
"name": "Caption",
"font": null
},
{
"prov": [
{
@@ -1017,6 +1063,29 @@
"type": "figure",
"$ref": "#/figures/2"
},
{
"prov": [
{
"bbox": [
134.765,
636.15033,
480.5874,
666.2008100000002
],
"page": 7,
"span": [
0,
207
],
"__ref_s3_data": null
}
],
"text": "Fig. 3. OTSL description of table structure: A - table example; B - graphical representation of table structure; C - mapping structure on a grid; D - OTSL structure encoding; E - explanation on cell encoding",
"type": "caption",
"payload": null,
"name": "Caption",
"font": null
},
{
"prov": [
{
@@ -1390,6 +1459,29 @@
"type": "figure",
"$ref": "#/figures/3"
},
{
"prov": [
{
"bbox": [
134.76501,
288.26035,
480.59082,
307.35187
],
"page": 8,
"span": [
0,
104
],
"__ref_s3_data": null
}
],
"text": "Fig. 4. Architecture sketch of the TableFormer model, which is a representative for the Im2Seq approach.",
"type": "caption",
"payload": null,
"name": "Caption",
"font": null
},
{
"prov": [
{
@@ -1658,6 +1750,29 @@
"type": "figure",
"$ref": "#/figures/4"
},
{
"prov": [
{
"bbox": [
134.765,
352.28284,
480.59106,
394.40988
],
"page": 10,
"span": [
0,
270
],
"__ref_s3_data": null
}
],
"text": "Fig. 5. The OTSL model produces more accurate bounding boxes with less overlap (E) than the HTML model (D), when predicting the structure of a sparse table (A), at twice the inference speed because of shorter sequence length (B),(C). \"PMC2807444_006_00.png\" PubTabNet. \u03bc",
"type": "caption",
"payload": null,
"name": "Caption",
"font": null
},
{
"prov": [
{
@@ -1709,6 +1824,29 @@
"type": "figure",
"$ref": "#/figures/5"
},
{
"prov": [
{
"bbox": [
134.765,
614.23236,
480.58838000000003,
666.2008100000002
],
"page": 11,
"span": [
0,
390
],
"__ref_s3_data": null
}
],
"text": "Fig. 6. Visualization of predicted structure and detected bounding boxes on a complex table with many rows. The OTSL model (B) captured repeating pattern of horizontally merged cells from the GT (A), unlike the HTML model (C). The HTML model also didn't complete the HTML sequence correctly and displayed a lot more of drift and overlap of bounding boxes. \"PMC5406406_003_01.png\" PubTabNet.",
"type": "caption",
"payload": null,
"name": "Caption",
"font": null
},
{
"prov": [
{