fix: Fixing images in the input Word files (#330)

* Fixing images identification in the input Word files

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Populating extracted image data into docling picture for wordx backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated tests

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed base64 dependency in msword_backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
Maxim Lysak
2024-11-14 13:33:34 +01:00
committed by GitHub
parent bf2a85f1d4
commit 8533039b0c
4 changed files with 107 additions and 78 deletions

View File

@@ -2,7 +2,7 @@ item-0 at level 0: unspecified: group _root_
item-1 at level 1: paragraph: Summer activities
item-2 at level 1: title: Swimming in the lake
item-3 at level 2: paragraph: Duck
item-4 at level 2: paragraph:
item-4 at level 2: picture
item-5 at level 2: paragraph: Figure 1: This is a cute duckling
item-6 at level 2: section_header: Lets swim!
item-7 at level 3: paragraph: To get started with swimming, fi ... down in a water and try not to drown:

File diff suppressed because one or more lines are too long

View File

@@ -4,6 +4,8 @@ Summer activities
Duck
<!-- image -->
Figure 1: This is a cute duckling
## Lets swim!