fix: added extraction of byte-images in excel (#804)

* fix(msexcel): ignore Mypy checking for _find_images_in_sheet function

Signed-off-by: Jiun An Tsai <andrew@247365-Macbook.local>

* fixed some issues

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pinned pillow in pyproject

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Jiun An Tsai <andrew@247365-Macbook.local>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Jiun An Tsai <andrew@247365-Macbook.local>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Peter W. J. Staar
2025-01-24 18:48:02 +01:00
committed by GitHub
parent 16a218d871
commit a458e298ca
8 changed files with 90 additions and 47 deletions

View File

@@ -7,4 +7,5 @@ item-0 at level 0: unspecified: group _root_
item-6 at level 2: table with [5x3]
item-7 at level 1: section: group sheet: Sheet3
item-8 at level 2: table with [7x3]
item-9 at level 2: table with [7x3]
item-9 at level 2: table with [7x3]
item-10 at level 2: picture

File diff suppressed because one or more lines are too long

View File

@@ -48,4 +48,6 @@
| 3 | 4 | 5 |
| 3 | 6 | 7 |
| 8 | 9 | 9 |
| 10 | 9 | 9 |
| 10 | 9 | 9 |
<!-- image -->

Binary file not shown.

View File

@@ -53,7 +53,7 @@ def test_e2e_xlsx_conversions():
converter = get_converter()
for xlsx_path in xlsx_paths:
# print(f"converting {xlsx_path}")
print(f"converting {xlsx_path}")
gt_path = (
xlsx_path.parent.parent / "groundtruth" / "docling_v2" / xlsx_path.name