* Initial async pdf pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* UpstreamAwareQueue
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Refactoring into async pipeline primitives and graph
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanups and safety improvements
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Better threaded PDF pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pin docling-ibm-models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove unused args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Revise pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Unload doc backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Revert "Unload doc backend"
This reverts commit 01066f0b6e.
* Remove redundant method
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update threaded test
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>
* Stop accumulating docs in test run
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix: don't starve on docs with > max_queue_size pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix: don't starve on docs with > max_queue_size pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com>
I, Christoph Auer <cau@zurich.ibm.com>, hereby add my Signed-off-by to this commit: fa71cde950
I, Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>, hereby add my Signed-off-by to this commit: d66da87d96
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix: python3.9 compat
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Option to enable threadpool with doc_batch_concurrency setting
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clean up unused code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix settings defaults expectations
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Use released docling-ibm-models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove ignores for typing/linting
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal>
* Use device_map for transformer models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Relax accelerate min version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make pipeline cache+init thread-safe
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Establish layout_model spec and example instantations
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Updated naming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Back to uppercase constants
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix deps issue with openai-whipser>numba>llvmlite
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pull v1 changed test GT from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Integrate ListItemMarkerProcessor into document assembly
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update to final version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update all test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Upgrade deps
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* scaffolding in place
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* doing scaffolding for audio pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* WIP: got first transcription working
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, time to start cleaning up
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* first working ASR pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added openai-whisper as a first transcription model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updating with asr_options
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalised the first working ASR pipeline with Whisper
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* use whisper from the latest git commit
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
* updated comment
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* AudioBackend -> DummyBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* file rename
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename to NoOpBackend, add test for ASR pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Support every format in NoOpBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add missing audio file and test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Install ffmpeg system dependency for ASR test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>