* add mlx-whisper support
* added mlx-whisper example and test. update docling cli to use MLX automatically if present.
* fix pre-commit checks and added proper type safety
* fixed linter issue
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: a979a680e1dc2fee8461401335cfb5dda8cfdd98
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 9827068382ca946fe1387ed83f747ae509fcf229
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: ebbeb45c7dc266260e1fad6bdb54a7041f8aeed4
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 2f6fd3cf46c8ca0bb98810191578278f1df87aa3
Signed-off-by: Ken Steele <ksteele@gmail.com>
* fix unit tests and code coverage for CI
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 5e61bf11139a2133978db2c8d306be6289aed732
Signed-off-by: Ken Steele <ksteele@gmail.com>
* fix CI example test - mlx_whisper_example.py defaults to tests/data/audio/sample_10s.mp3 if no args specified.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* refactor: centralize audio file extensions and MIME types in base_models.py
- Move audio file extensions from CLI hardcoded set to FormatToExtensions[InputFormat.AUDIO]
- Add support for additional audio formats: m4a, aac, ogg, flac, mp4, avi, mov
- Update FormatToMimeType mapping to include MIME types for all audio formats
- Update CLI auto-detection to use centralized FormatToExtensions mapping
- Add comprehensive tests for audio file auto-detection and pipeline selection
- Ensure explicit pipeline choices are not overridden by auto-detection
Fixes issue where only .mp3 and .wav files were processed as audio despite
CLI auto-detection working for all formats. The document converter now
properly recognizes all audio formats through MIME type detection.
Addresses review comments:
- Centralizes audio extensions in base_models.py as suggested
- Maintains existing auto-detection behavior while using centralized data
- Adds proper test coverage for the audio detection functionality
All examples and tests pass with the new centralized approach.
All audio formats (mp3, wav, m4a, aac, ogg, flac, mp4, avi, mov) now work correctly.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* feat: address reviewer feedback - improve CLI auto-detection and add explicit model options
Review feedback addressed:
1. Fix CLI auto-detection to only switch to ASR pipeline when ALL files are audio
- Previously switched if ANY file was audio, now requires ALL files to be audio
- Added warning for mixed file types with guidance to use --pipeline asr
2. Add explicit WHISPER_X_MLX and WHISPER_X_NATIVE model options
- Users can now force specific implementations if desired
- Auto-selecting models (WHISPER_BASE, etc.) still choose best for hardware
- Added 12 new explicit model options: _MLX and _NATIVE variants for each size
CLI now supports:
- Auto-selecting: whisper_tiny, whisper_base, etc. (choose best for hardware)
- Explicit MLX: whisper_tiny_mlx, whisper_base_mlx, etc. (force MLX)
- Explicit Native: whisper_tiny_native, whisper_base_native, etc. (force native)
Addresses reviewer comments from @dolfim-ibm
Signed-off-by: Ken Steele <ksteele@gmail.com>
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: c60e72d2b5
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 94803317a3
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 21905e8ace
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 96c669d155
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 8371c060ea
Signed-off-by: Ken Steele <ksteele@gmail.com>
* test(asr): add coverage for MLX options, pipeline helpers, and VLM prompts
- tests/test_asr_mlx_whisper.py: verify explicit MLX options (framework, repo ids)
- tests/test_asr_pipeline.py: cover _has_text/_determine_status and backend support with proper InputDocument/NoOpBackend wiring
- tests/test_interfaces.py: add BaseVlmPageModel.formulate_prompt tests (RAW/NONE/CHAT, invalid style), with minimal InlineVlmOptions scaffold
Improves reliability of ASR and VLM components by validating configuration paths and helper logic.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* test(asr): broaden coverage for model selection, pipeline flows, and VLM prompts
- tests/test_asr_mlx_whisper.py
- Add MLX/native selector coverage across all Whisper sizes
- Validate repo_id choices under MLX and Native paths
- Cover fallback path when MPS unavailable and mlx_whisper missing
- tests/test_asr_pipeline.py
- Relax silent-audio assertion to accept PARTIAL_SUCCESS or SUCCESS
- Force CPU native path in helper tests to avoid torch in device selection
- Add language handling tests for native/MLX transcribe
- Cover native run success (BytesIO) and failure (exception) branches
- Cover MLX run success/failure branches with mocked transcribe
- Add init path coverage with artifacts_path
- tests/test_interfaces.py
- Add focused VLM prompt tests (NONE/CHAT variants)
Result: all tests passing with significantly improved coverage for ASR model selectors, pipeline execution paths, and VLM prompt formulation.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* simplify ASR model settings (no pipeline detection needed)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* clean up disk space in runners
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Ken Steele <ksteele@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* Update tests to use default PDF backend (DPv4)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* OCR tests use DPv1 until rotation bugs are fixed
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Run Docs CD / build-deploy-docs (push) Failing after 1m25s
Run Docs CI / build-docs (push) Failing after 52s
* Add DoclingParseV3 backend implementation
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Use docling-core with docling-parse types
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes and test updates
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix streams
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix streams
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* update test units
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back DoclingParse v1 backend, pipeline options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update locks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: update docling-core to 2.22.0
Update dependency library docling-core to latest release 2.22.0
Fix regression tests and ground truth files
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* Ground-truth files updated
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update tests, use TextCell.from_ocr property
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Text fixes, new test data
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename docling backend to v4
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Test all backends, fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset all tests to use docling-parse v1 for now
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for DPv4 backend init, better test coverage
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* test_input_doc use default backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
fix: Support for RTL programmatic documents
fix(parser): detect and handle rotated pages
fix(parser): fix bug causing duplicated text
fix(formula): improve stopping criteria
chore: update lock file
fix: temporary constrain beautifulsoup
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* cleaned up the data folder in the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* switch to code formula model v1.0.1 and new test pdf
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added three test-files for right-to-left
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fix black
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added new gt for test_e2e_conversion
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* added new gt for test_e2e_conversion
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* Add code to expose text direction of cell
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* new test file
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
* update lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix mypy reports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix example filepaths
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add test data results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin wheel of latest docling-parse release
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use latest docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove debugging code
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix path to files in example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Revert unwanted RTL additions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix test data paths in examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Matteo-Omenetti <Matteo.Omenetti1@ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
* add the pytests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* renamed the test folder and added the toplevel test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the toplevel function test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to start running all tests successfully
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the reference converted documents
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added first test for json and md output
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ran pre-commit
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* replaced deprecated json function with model_dump_json
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* replaced deprecated json function with model_dump_json
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Fix backend tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* commented out the drawing
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ci: avoid duplicate runs
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* commented out json verification for now
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added verification of input cells
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformat code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages (3)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* run all examples in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* make sure examples return failures
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* raise a failure if examples fail
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix examples
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* run examples after tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Add tests and update top_level_tests using only datamodels
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove unnecessary code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Validate conversion status on e2e test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* package verify utils and add more tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* reduce docs in example, since they are already in the tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip batch_convert
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin docling-parse 1.1.2
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* updated the error messages
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* commented out the json verification for now
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* bumped GLM version
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Fix lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pin new docling-parse v1.1.3
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>