docling/docling
Christoph Auer aed772ab33
feat: Threaded PDF pipeline (#1951)
* Initial async pdf pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* UpstreamAwareQueue

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Refactoring into async pipeline primitives and graph

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanups and safety improvements

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Better threaded PDF pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Pin docling-ibm-models

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove unused args

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add test

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Revise pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Unload doc backend

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Revert "Unload doc backend"

This reverts commit 01066f0b6e.

* Remove redundant method

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update threaded test

Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>

* Stop accumulating docs in test run

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix: don't starve on docs with > max_queue_size pages

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix: don't starve on docs with > max_queue_size pages

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com>

I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: fa71cde950
I, Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>, hereby add my Signed-off-by to this commit: d66da87d96

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix: python3.9 compat

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Option to enable threadpool with doc_batch_concurrency setting

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Clean up unused code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix settings defaults expectations

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Use released docling-ibm-models

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove ignores for typing/linting

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>
2025-07-26 11:49:37 +02:00
..
backend fix(markdown): ensure correct parsing of nested lists (#1995) 2025-07-25 15:17:57 +02:00
chunking feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
cli feat: Support audio input (#1763) 2025-06-23 14:47:26 +02:00
datamodel feat: Threaded PDF pipeline (#1951) 2025-07-26 11:49:37 +02:00
models feat: Threaded PDF pipeline (#1951) 2025-07-26 11:49:37 +02:00
pipeline feat: Threaded PDF pipeline (#1951) 2025-07-26 11:49:37 +02:00
utils fix: Keep formula clusters also when empty (#1970) 2025-07-22 17:02:12 +02:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
document_converter.py feat: Threaded PDF pipeline (#1951) 2025-07-26 11:49:37 +02:00
exceptions.py feat: Introduce the enable_remote_services option to allow remote connections while processing (#941) 2025-02-12 15:18:01 +01:00
py.typed fix: Add py.typed marker file (#531) 2024-12-06 13:42:14 +01:00