* add example processing parquet file of images
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* vlm using vllm api
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use openvino and add more docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add default input file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* change default to standard for running in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use simple rapidocr without openvino in the CI example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* adding granite-docling preview
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the model specs
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Add Layout+VLM pipeline with prompt injection, ApiVlmModel updates
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update layout injection, move to experimental
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust defaults
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Map Layout+VLM pipeline to GraniteDoclign
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove base_prompt from layout injection prompt
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reinstate custom prompt
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* add demo_layout file that produces with vs without layout injection
Signed-off-by: Peter El Hachem <peter.el.hachem@ibm.com>
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: wrap vlm_inference around process_images
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: carry input prompt + number of input tokens
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* fix: adapt example to run on local test file
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* fix: example now expects single document
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: add layout example to EXAMPLES_TO_SKIP
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: address comments on git
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: add inference wrapper for hf_transformers + carry input prompt
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* Feat: add track_input_prompt to ApiVlmOptions, and track input prompt as part of api vlm
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* fix: Ensure backward-compatible build_prompt by adding _internal_page ag
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Ensure backward-compatible build_prompt by adding _internal_page ag
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for demo
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Typing fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Restoring lost changes in vllm_model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Restoring vlm_pipeline_api_model example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Peter El Hachem <peter.el.hachem@ibm.com>
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: ElHachem02 <peterelhachem02@gmail.com>
* docs(opensearch): update the example notebook RAG with OpenSearch
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs(uspto): remove direct usage of the backend class for conversion
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: remove direct usage of backends from documentation
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* add mlx-whisper support
* added mlx-whisper example and test. update docling cli to use MLX automatically if present.
* fix pre-commit checks and added proper type safety
* fixed linter issue
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: a979a680e1dc2fee8461401335cfb5dda8cfdd98
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 9827068382ca946fe1387ed83f747ae509fcf229
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: ebbeb45c7dc266260e1fad6bdb54a7041f8aeed4
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 2f6fd3cf46c8ca0bb98810191578278f1df87aa3
Signed-off-by: Ken Steele <ksteele@gmail.com>
* fix unit tests and code coverage for CI
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 5e61bf11139a2133978db2c8d306be6289aed732
Signed-off-by: Ken Steele <ksteele@gmail.com>
* fix CI example test - mlx_whisper_example.py defaults to tests/data/audio/sample_10s.mp3 if no args specified.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* refactor: centralize audio file extensions and MIME types in base_models.py
- Move audio file extensions from CLI hardcoded set to FormatToExtensions[InputFormat.AUDIO]
- Add support for additional audio formats: m4a, aac, ogg, flac, mp4, avi, mov
- Update FormatToMimeType mapping to include MIME types for all audio formats
- Update CLI auto-detection to use centralized FormatToExtensions mapping
- Add comprehensive tests for audio file auto-detection and pipeline selection
- Ensure explicit pipeline choices are not overridden by auto-detection
Fixes issue where only .mp3 and .wav files were processed as audio despite
CLI auto-detection working for all formats. The document converter now
properly recognizes all audio formats through MIME type detection.
Addresses review comments:
- Centralizes audio extensions in base_models.py as suggested
- Maintains existing auto-detection behavior while using centralized data
- Adds proper test coverage for the audio detection functionality
All examples and tests pass with the new centralized approach.
All audio formats (mp3, wav, m4a, aac, ogg, flac, mp4, avi, mov) now work correctly.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* feat: address reviewer feedback - improve CLI auto-detection and add explicit model options
Review feedback addressed:
1. Fix CLI auto-detection to only switch to ASR pipeline when ALL files are audio
- Previously switched if ANY file was audio, now requires ALL files to be audio
- Added warning for mixed file types with guidance to use --pipeline asr
2. Add explicit WHISPER_X_MLX and WHISPER_X_NATIVE model options
- Users can now force specific implementations if desired
- Auto-selecting models (WHISPER_BASE, etc.) still choose best for hardware
- Added 12 new explicit model options: _MLX and _NATIVE variants for each size
CLI now supports:
- Auto-selecting: whisper_tiny, whisper_base, etc. (choose best for hardware)
- Explicit MLX: whisper_tiny_mlx, whisper_base_mlx, etc. (force MLX)
- Explicit Native: whisper_tiny_native, whisper_base_native, etc. (force native)
Addresses reviewer comments from @dolfim-ibm
Signed-off-by: Ken Steele <ksteele@gmail.com>
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: c60e72d2b5
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 94803317a3
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 21905e8ace
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 96c669d155
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 8371c060ea
Signed-off-by: Ken Steele <ksteele@gmail.com>
* test(asr): add coverage for MLX options, pipeline helpers, and VLM prompts
- tests/test_asr_mlx_whisper.py: verify explicit MLX options (framework, repo ids)
- tests/test_asr_pipeline.py: cover _has_text/_determine_status and backend support with proper InputDocument/NoOpBackend wiring
- tests/test_interfaces.py: add BaseVlmPageModel.formulate_prompt tests (RAW/NONE/CHAT, invalid style), with minimal InlineVlmOptions scaffold
Improves reliability of ASR and VLM components by validating configuration paths and helper logic.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* test(asr): broaden coverage for model selection, pipeline flows, and VLM prompts
- tests/test_asr_mlx_whisper.py
- Add MLX/native selector coverage across all Whisper sizes
- Validate repo_id choices under MLX and Native paths
- Cover fallback path when MPS unavailable and mlx_whisper missing
- tests/test_asr_pipeline.py
- Relax silent-audio assertion to accept PARTIAL_SUCCESS or SUCCESS
- Force CPU native path in helper tests to avoid torch in device selection
- Add language handling tests for native/MLX transcribe
- Cover native run success (BytesIO) and failure (exception) branches
- Cover MLX run success/failure branches with mocked transcribe
- Add init path coverage with artifacts_path
- tests/test_interfaces.py
- Add focused VLM prompt tests (NONE/CHAT variants)
Result: all tests passing with significantly improved coverage for ASR model selectors, pipeline execution paths, and VLM prompt formulation.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* simplify ASR model settings (no pipeline detection needed)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* clean up disk space in runners
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Ken Steele <ksteele@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Update custom_convert.py
export_to_document_tokens is deprecated so change it to export_to_doctags
Signed-off-by: Jeremy Chen <github@jeremychen.email>
* Experimental code for repetition detection, VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM inference code, CLI and VLM specs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix generation and decoder args for HF model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix vllm device args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bugfixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove streaming VLLM for the moment
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add repetition StoppingCriteria for GraniteDocling/SmolDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make GenerationStopper base class and port for MLX
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add streaming support and custom GenerationStopper support for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix api_image_request_streaming when GenerationStopper triggers.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move DocTagsRepetitionStopper to utility unit, update examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* feat: add a backend parser for WebVTT files
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: update README with VTT support
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: add description to supported formats
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore: upgrade docling-core to unescape WebVTT in markdown
Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* test: add missing copyright notice
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* docs: add an example of RAG with OpeanSearch
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore: pin latest docling-core and update uv.lock
Pin latest version release of docling-core in pyproject.toml
Update the dependencies in uv.lock file
Run the notebook rag_opensearch.ipynb to pick up changes from docling-core
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* feat: exploring new version
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* chore: autoformat
DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775
* feat: enable configurable runtime for rapidocr and handle new result better;
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* chore: fix linter
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* chore: use new server model
* chore: change default engine type to onnx
* chore: tests update for new rapidocr
* fix: rebase from main and fix clashes
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 02f9db85f562e5cdfda40c52fee55cfd4030d70a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a7bcb205faedb881f94a89b3bbd29cb31ccd54f0
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a39482a98cbcff7a825c8321134732af0c65930a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63e9d717fa26951566b02761f3fdfc752c31f805
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ef12a6ec1ea2846a8a8e2e776eeaa59c2a0c4dfe
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 2222d2340387f8d9d66f3ca9d8e21a0945a44e7a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: bc6a1dc507d7f146ec4797a2d3840414f46ac64d
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 56e0d67da7c57d4b5caf8eaef8dff7056c3efd32
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 871ca21271412006c76acf3c19426140efed3d50
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7b1b77159da729d483a581a86c7309acba1712a7
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a792a714a43e19a91b2b782f54621c1c5efda632
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: d1fed26323ff829b716bc667fe69532839363e45
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 346ec1cad943765f886e5d17fb0a54221124689c
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 4d0bbe5bd6e9f7261b97362ff8823af244267089
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 34a5ad53892a7064a6bf35f890d344d464c78b2f
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 9151959db3ad53535011d1cfdcf9181fdf936bb1
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 8ef5536f2c098826c6c0a05190f8a80614c3f3cb
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7e18637a35
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63fb8ff599
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 0cb9444fb8
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 38940d9978
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: b6d461ac42
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ee55eb3408
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
---------
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
* Added option to docling-tools to download arbitrary HuggingFace model
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Added note in documentation
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Removed note on custom artifact path usage from HF download option
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Fixed typo
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
---------
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
* Notebook showing example on how to use docling transforms in DPK
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* fix HF Token name
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* use %pip instead of pip install jupyter lab
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* run formatter
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* add example to mkdocs and fix typo
Signed-off-by: Maroun Touma <touma@us.ibm.com>
---------
Signed-off-by: Maroun Touma <touma@us.ibm.com>
* Add ability to preprocess VLM response
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
* Move response decoding to vlm options (requires inheritance to override). Per-page prompt formulation also moved to vlm options to keep api consistent.
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
---------
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>