Christoph Auer
cf78d5b7b9
feat: Add content_layer property to items to address body, furniture and other roles ( #735 )
...
* feat: Pass predicted page-headers and page-footers through to DoclingDocument furniture
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* chore: Update all test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* fix: update all test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* fix: update all test cases again
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update lock
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update lock to final docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-02-10 12:07:49 +01:00
Maxim Lysak
8533039b0c
fix: Fixing images in the input Word files ( #330 )
...
* Fixing images identification in the input Word files
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Populating extracted image data into docling picture for wordx backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Updated tests
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* removed base64 dependency in msword_backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com >
2024-11-14 13:33:34 +01:00
Peter W. J. Staar
f542460af3
fix: fix duplicate title and heading + add e2e tests for html and docx ( #186 )
...
* add real e2e tests for html and docx
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the output of itxt
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* reformatted the text
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the tests (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the examples (1)
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the output of the test
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the tests, moved the ground-truth
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* moved the ground-truth data
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* fixed the html tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* restructure title fix (#187 )
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com >
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com >
2024-10-30 13:14:56 +01:00