copilot-swe-agent[bot]
aa75dd13d3
test: mark timeout test as manual due to model requirement
...
The test requires pre-downloaded models from HuggingFace.
Added skip marker and comprehensive docstring explaining the test purpose.
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
2025-11-17 09:27:27 +00:00
copilot-swe-agent[bot]
e3aa8cd770
feat: add document_timeout support to StandardPdfPipeline
...
- Add timeout tracking in _build_document method
- Check elapsed time against document_timeout in processing loop
- Set PARTIAL_SUCCESS status when timeout is exceeded
- Add test for document_timeout behavior
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
2025-11-17 09:23:28 +00:00
Michele Dolfi
268d027c8f
feat: Use threading in the standard pipeline and move old behavior to legacy ( #2452 )
...
* rename standard to legacy
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* remove old standard pipeline
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* move threaded to standard
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add backwards compatible threaded pipeline
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Updates for threaded pipeline to lower memory requirements
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* updating deps seem to remove the corrupted double-linked list error
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update pinning
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* use main lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add more threadsafe blocks
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* rename batch_timeout_seconds
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-10-31 14:42:11 +01:00
Michele Dolfi
a51275d080
fix(pdf): threadsafe for pypdfium2 backend ( #2527 )
...
* add threadsafe test
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* test backend
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* test threaded pipeline
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add test_pypdfium_threaded_pipeline
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add more threadsafe blocks
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* fix threadsafe in pypdfium backend
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* remove unneccessary tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* restore clean test
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-30 17:58:39 +01:00
Christoph Auer
aed772ab33
feat: Threaded PDF pipeline ( #1951 )
...
* Initial async pdf pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* UpstreamAwareQueue
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Refactoring into async pipeline primitives and graph
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanups and safety improvements
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Better threaded PDF pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Pin docling-ibm-models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove unused args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Revise pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Unload doc backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Revert "Unload doc backend"
This reverts commit 01066f0b6e .
* Remove redundant method
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update threaded test
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >
* Stop accumulating docs in test run
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix: don't starve on docs with > max_queue_size pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix: don't starve on docs with > max_queue_size pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com >
I, Christoph Auer <cau@zurich.ibm.com >, hereby add my Signed-off-by to this commit: fa71cde950
I, Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >, hereby add my Signed-off-by to this commit: d66da87d96
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix: python3.9 compat
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Option to enable threadpool with doc_batch_concurrency setting
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Clean up unused code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix settings defaults expectations
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Use released docling-ibm-models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove ignores for typing/linting
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >
Co-authored-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >
2025-07-26 11:49:37 +02:00