feat: updated the backend for new docling-parse (#2187)

* updated the backend and pyproject.toml

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the version and test files

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the lock

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* forgot to add 1 updated test-file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the lock

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
This commit is contained in:
Peter W. J. Staar
2025-09-05 10:42:31 +02:00
committed by GitHub
parent 2c3f6faf3d
commit b3d7542061
7 changed files with 826 additions and 851 deletions

View File

@@ -47,8 +47,12 @@ class DoclingParseV4PageBackend(PdfPageBackend):
seg_page = self._dp_doc.get_page(
self._page_no + 1,
keep_chars=True,
keep_lines=True,
keep_bitmaps=True,
create_words=self._create_words,
create_textlines=self._create_textlines,
enforce_same_font=True,
)
# In Docling, all TextCell instances are expected with top-left origin.