feat: Upgrade docling-parse PDF backend and interface to use page-by-page parsing (#44)

* Use docling-parse page-by-page Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Propagate document_hash to PDF backends, use docling-parse 1.0.0 Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Upgrade lockfile Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * repin after more packages on pypi Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-12-11 14:18:30 +00:00 · 2024-08-22 13:49:37 +02:00
parent f7c50c8b0e
commit a8c6b29a67
8 changed files with 73 additions and 51 deletions
--- a/docling/document_converter.py
+++ b/docling/document_converter.py
@@ -141,6 +141,8 @@ class DocumentConverter:
        start_doc_time = time.time()
        converted_doc = ConvertedDocument(input=in_doc)

+        _log.info(f"Processing document {in_doc.file.name}")
+
        if not in_doc.valid:
            converted_doc.status = ConversionStatus.FAILURE
            return converted_doc