feat: Upgrade docling-parse PDF backend and interface to use page-by-page parsing (#44)

* Use docling-parse page-by-page

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Propagate document_hash to PDF backends, use docling-parse 1.0.0

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Upgrade lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* repin after more packages on pypi

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Christoph Auer
2024-08-22 13:49:37 +02:00
committed by GitHub
parent f7c50c8b0e
commit a8c6b29a67
8 changed files with 73 additions and 51 deletions

View File

@@ -32,7 +32,7 @@ pydantic-settings = "^2.3.0"
huggingface_hub = ">=0.23,<1"
requests = "^2.32.3"
easyocr = "^1.7"
docling-parse = "^0.2.0"
docling-parse = "^1.0.0"
certifi = ">=2024.7.4"
rtree = "^1.3.0"
scipy = "^1.14.1"