github-actions[bot]
2aef5cf328
chore: bump version to 2.47.1 [skip ci]
2025-08-23 14:11:33 +00:00
Michele Dolfi
488f6cdd2d
fix: vllm extra only for linux x86_64 ( #2126 )
...
vllm extra only for linux x86_64
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-08-23 13:33:15 +02:00
github-actions[bot]
b04e205d1e
chore: bump version to 2.47.0 [skip ci]
2025-08-22 14:15:39 +00:00
Christoph Auer
3c660c0511
feat: batching support for VLMs in transformers backend, add initial VLLM backend ( #2094 )
...
* Prepare existing codes for use with new multi-stage VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add multithreaded VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add VLM task interpreters
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add VLM task interpreters
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove prints
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix KeyboardInterrupt behaviour
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add VLLM backend support, optimize process_images
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Tweak defaults
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Implement proper batch inference for HuggingFaceTransformersVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Small fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanup hf_transformers_model batching impl
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Adjust example instatiation of multi-stage VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add GoT OCR 2.0
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Factor out changes without multi-stage pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Reset defaults for generation
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add torch.compile, fix temperature setting in gen_kwargs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Expose page_batch_size on CLI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add torch_dtype bfloat16 to SMOLDOCLING and SMOLVLM model spec
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Clip off pad_token
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-08-22 13:17:33 +02:00
github-actions[bot]
555506d8e6
chore: bump version to 2.46.0 [skip ci]
2025-08-20 15:25:07 +00:00
Panos Vagenas
76d2cb76b3
chore: update docling-core lock ( #2110 )
...
* chore: pre-check docling-core 2.45.0
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update -core pinning
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-08-20 16:41:48 +02:00
Christoph Auer
5f57ff2a45
perf: Clean up resources with docling-parse v4, no parsed_page output by default ( #2105 )
...
* Call PdfDocument.unload_pages from the pipelines where needed, delete parsed_page data unless requested to keep
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* pin docling-parse and update lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Reinstate pipeline_options.generate_parsed_page
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-08-20 10:46:31 +02:00
Michele Dolfi
956f82f115
chore: upgrade dependencies in lock file ( #2093 )
...
* chore: upgrade lock file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* fix(markdown): update binary hash of a markdown backend ground truth file
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
2025-08-19 10:11:44 +02:00
github-actions[bot]
c3a7d1d999
chore: bump version to 2.45.0 [skip ci]
2025-08-18 10:25:51 +00:00
github-actions[bot]
ccfee05847
chore: bump version to 2.44.0 [skip ci]
2025-08-12 09:51:35 +00:00
Michele Dolfi
c5f49dc2db
chore: upgrade locked dependencies ( #2024 )
...
lock new deps
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-07-31 16:05:27 +02:00
TwoLeaves
0130e3ae96
fix: support new mlx-vlm module ( #2001 )
...
* fix stream_generate import statement
Signed-off-by: TwoLeaves <ohneherren@gmail.com >
* pin new mlx-vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: TwoLeaves <ohneherren@gmail.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-07-31 14:13:17 +02:00
github-actions[bot]
aae42b37a8
chore: bump version to 2.43.0 [skip ci]
2025-07-28 09:45:53 +00:00
Christoph Auer
aed772ab33
feat: Threaded PDF pipeline ( #1951 )
...
* Initial async pdf pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* UpstreamAwareQueue
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Refactoring into async pipeline primitives and graph
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanups and safety improvements
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Better threaded PDF pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Pin docling-ibm-models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove unused args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Revise pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Unload doc backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Revert "Unload doc backend"
This reverts commit 01066f0b6e .
* Remove redundant method
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update threaded test
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >
* Stop accumulating docs in test run
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix: don't starve on docs with > max_queue_size pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix: don't starve on docs with > max_queue_size pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* DCO Remediation Commit for Christoph Auer <cau@zurich.ibm.com >
I, Christoph Auer <cau@zurich.ibm.com >, hereby add my Signed-off-by to this commit: fa71cde950
I, Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >, hereby add my Signed-off-by to this commit: d66da87d96
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix: python3.9 compat
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Option to enable threadpool with doc_batch_concurrency setting
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Clean up unused code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix settings defaults expectations
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Use released docling-ibm-models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove ignores for typing/linting
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >
Co-authored-by: Ubuntu <ubuntu@ip-172-31-30-253.eu-central-1.compute.internal >
2025-07-26 11:49:37 +02:00
Cesar Berrospi Ramis
aec29a7315
fix(markdown): ensure correct parsing of nested lists ( #1995 )
...
* fix(markdown): ensure correct parsing of nested lists
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
* chore: update dependencies in uv.lock file
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
2025-07-25 15:17:57 +02:00
Christoph Auer
1985841a19
ci: Fixes for test GT ( #1992 )
...
Fixes for test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-25 12:28:06 +02:00
github-actions[bot]
8227841c1b
chore: bump version to 2.42.2 [skip ci]
2025-07-24 10:21:10 +00:00
github-actions[bot]
ec971bbe68
chore: bump version to 2.42.1 [skip ci]
2025-07-22 16:45:48 +00:00
github-actions[bot]
7561be537a
chore: bump version to 2.42.0 [skip ci]
2025-07-18 15:34:59 +00:00
Christoph Auer
cca05c45ea
fix: Safe pipeline init, use device_map in transformers models ( #1917 )
...
* Use device_map for transformer models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Relax accelerate min version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Make pipeline cache+init thread-safe
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-18 15:14:36 +02:00
github-actions[bot]
6c4bf9d087
chore: bump version to 2.41.0 [skip ci]
2025-07-10 14:25:05 +00:00
Christoph Auer
2b8616d6d5
feat: Layout model specification and multiple choices ( #1910 )
...
* Establish layout_model spec and example instantations
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Updated naming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Back to uppercase constants
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* fix deps issue with openai-whipser>numba>llvmlite
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Pull v1 changed test GT from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-10 06:37:27 +02:00
Panos Vagenas
ec588df971
feat: enable precision control in float serialization ( #1914 )
...
* chore: propagate precision control in float serialization
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* parametrize float serialization, propagate core updates
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update test float precision
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* repin docling-core
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-07-09 16:39:17 +02:00
Clément Doumouro
931eb55b88
fix(ocr-utils): unit test and fix the rotate_bounding_box function ( #1897 )
...
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com >
2025-07-08 18:03:29 +02:00
github-actions[bot]
f4a1c06937
chore: bump version to 2.40.0 [skip ci]
2025-07-04 15:31:36 +00:00
Christoph Auer
56a0e104f7
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
...
* Integrate ListItemMarkerProcessor into document assembly
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update to final version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update all test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Upgrade deps
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-01 10:04:58 +02:00
Christoph Auer
bdfee4e2d0
chore: Safer unloading of DPv4 backend ( #1867 )
...
fix: Safer unloading of DPv4 backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-30 14:41:21 +02:00
github-actions[bot]
bb99be6c24
chore: bump version to 2.39.0 [skip ci]
2025-06-27 15:37:53 +00:00
Panos Vagenas
0533da1923
feat: leverage new list modeling, capture default markers ( #1856 )
...
* chore: update docling-core & regenerate test data
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update backends to leverage new list modeling
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* repin docling-core
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* ensure availability of latest docling-core API
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-06-27 16:37:15 +02:00
github-actions[bot]
ee4781075a
chore: bump version to 2.38.1 [skip ci]
2025-06-25 16:27:46 +00:00
Panos Vagenas
7c5614a37a
fix(markdown): fix single-formatted headings & list items ( #1820 )
...
* fix(markdown): fix formatting & inline edge cases (show behavior before change)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* add change and updated test data
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update lock
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* improve test case
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-06-25 13:05:06 +02:00
github-actions[bot]
1dc63d0aa9
chore: bump version to 2.38.0 [skip ci]
2025-06-23 18:14:24 +00:00
Peter W. J. Staar
1557e7ce3e
feat: Support audio input ( #1763 )
...
* scaffolding in place
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* doing scaffolding for audio pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* WIP: got first transcription working
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* all working, time to start cleaning up
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* first working ASR pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added openai-whisper as a first transcription model
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updating with asr_options
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* finalised the first working ASR pipeline with Whisper
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* use whisper from the latest git commit
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
* updated comment
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* AudioBackend -> DummyBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* file rename
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Rename to NoOpBackend, add test for ASR pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Support every format in NoOpBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add missing audio file and test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Install ffmpeg system dependency for ASR test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-23 14:47:26 +02:00
github-actions[bot]
7bae3b6c06
chore: bump version to 2.37.0 [skip ci]
2025-06-16 11:02:54 +00:00
github-actions[bot]
40df0d74ad
chore: bump version to 2.36.1 [skip ci]
2025-06-04 11:43:13 +00:00
Michele Dolfi
8846f1a393
fix: remove typer and click constraints ( #1707 )
...
release typer and click constraints
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-06-04 13:06:23 +02:00
github-actions[bot]
96c54dba91
chore: bump version to 2.36.0 [skip ci]
2025-06-03 13:54:25 +00:00
Michele Dolfi
cdd401847a
feat: simplify dependencies, switch to uv ( #1700 )
...
* refactor with uv
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* constraints for onnxruntime
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* more constraints
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-06-03 15:18:54 +02:00