Nikhil Verma
|
3f03709885
|
fix: Improve numbered list detection for msword docs (#2100)
* Improve numbered list detection for msword docs
This fixes the list detection in MSWord docs by properly tracking and counting
the list entries. It fixes
https://github.com/docling-project/docling/issues/2090
* DCO Remediation Commit for Nikhil Verma <nikhilgotmail@gmail.com>
I, Nikhil Verma <nikhilgotmail@gmail.com>, hereby add my Signed-off-by to this commit: 509da6658e
Signed-off-by: Nikhil Verma <nikhilgotmail@gmail.com>
---------
Signed-off-by: Nikhil Verma <nikhilgotmail@gmail.com>
|
2025-08-22 10:38:34 +02:00 |
|
Maxim Lysak
|
8533039b0c
|
fix: Fixing images in the input Word files (#330)
* Fixing images identification in the input Word files
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Populating extracted image data into docling picture for wordx backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated tests
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed base64 dependency in msword_backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-11-14 13:33:34 +01:00 |
|
Peter W. J. Staar
|
f542460af3
|
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the output of itxt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the text
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the tests (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the examples (1)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the output of the test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the tests, moved the ground-truth
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* moved the ground-truth data
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the html tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restructure title fix (#187)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-30 13:14:56 +01:00 |
|