* adding granite-docling preview
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the model specs
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Add Layout+VLM pipeline with prompt injection, ApiVlmModel updates
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update layout injection, move to experimental
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust defaults
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Map Layout+VLM pipeline to GraniteDoclign
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove base_prompt from layout injection prompt
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reinstate custom prompt
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* add demo_layout file that produces with vs without layout injection
Signed-off-by: Peter El Hachem <peter.el.hachem@ibm.com>
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: wrap vlm_inference around process_images
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: carry input prompt + number of input tokens
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* fix: adapt example to run on local test file
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* fix: example now expects single document
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: add layout example to EXAMPLES_TO_SKIP
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: address comments on git
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* feat: add inference wrapper for hf_transformers + carry input prompt
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* Feat: add track_input_prompt to ApiVlmOptions, and track input prompt as part of api vlm
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
* fix: Ensure backward-compatible build_prompt by adding _internal_page ag
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: Ensure backward-compatible build_prompt by adding _internal_page ag
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for demo
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Typing fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Restoring lost changes in vllm_model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Restoring vlm_pipeline_api_model example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Peter El Hachem <peter.el.hachem@ibm.com>
Signed-off-by: ElHachem02 <peterelhachem02@gmail.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: ElHachem02 <peterelhachem02@gmail.com>
* add mlx-whisper support
* added mlx-whisper example and test. update docling cli to use MLX automatically if present.
* fix pre-commit checks and added proper type safety
* fixed linter issue
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: a979a680e1dc2fee8461401335cfb5dda8cfdd98
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 9827068382ca946fe1387ed83f747ae509fcf229
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: ebbeb45c7dc266260e1fad6bdb54a7041f8aeed4
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 2f6fd3cf46c8ca0bb98810191578278f1df87aa3
Signed-off-by: Ken Steele <ksteele@gmail.com>
* fix unit tests and code coverage for CI
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 5e61bf11139a2133978db2c8d306be6289aed732
Signed-off-by: Ken Steele <ksteele@gmail.com>
* fix CI example test - mlx_whisper_example.py defaults to tests/data/audio/sample_10s.mp3 if no args specified.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* refactor: centralize audio file extensions and MIME types in base_models.py
- Move audio file extensions from CLI hardcoded set to FormatToExtensions[InputFormat.AUDIO]
- Add support for additional audio formats: m4a, aac, ogg, flac, mp4, avi, mov
- Update FormatToMimeType mapping to include MIME types for all audio formats
- Update CLI auto-detection to use centralized FormatToExtensions mapping
- Add comprehensive tests for audio file auto-detection and pipeline selection
- Ensure explicit pipeline choices are not overridden by auto-detection
Fixes issue where only .mp3 and .wav files were processed as audio despite
CLI auto-detection working for all formats. The document converter now
properly recognizes all audio formats through MIME type detection.
Addresses review comments:
- Centralizes audio extensions in base_models.py as suggested
- Maintains existing auto-detection behavior while using centralized data
- Adds proper test coverage for the audio detection functionality
All examples and tests pass with the new centralized approach.
All audio formats (mp3, wav, m4a, aac, ogg, flac, mp4, avi, mov) now work correctly.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* feat: address reviewer feedback - improve CLI auto-detection and add explicit model options
Review feedback addressed:
1. Fix CLI auto-detection to only switch to ASR pipeline when ALL files are audio
- Previously switched if ANY file was audio, now requires ALL files to be audio
- Added warning for mixed file types with guidance to use --pipeline asr
2. Add explicit WHISPER_X_MLX and WHISPER_X_NATIVE model options
- Users can now force specific implementations if desired
- Auto-selecting models (WHISPER_BASE, etc.) still choose best for hardware
- Added 12 new explicit model options: _MLX and _NATIVE variants for each size
CLI now supports:
- Auto-selecting: whisper_tiny, whisper_base, etc. (choose best for hardware)
- Explicit MLX: whisper_tiny_mlx, whisper_base_mlx, etc. (force MLX)
- Explicit Native: whisper_tiny_native, whisper_base_native, etc. (force native)
Addresses reviewer comments from @dolfim-ibm
Signed-off-by: Ken Steele <ksteele@gmail.com>
* DCO Remediation Commit for Ken Steele <ksteele@gmail.com>
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: c60e72d2b5
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 94803317a3
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 21905e8ace
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 96c669d155
I, Ken Steele <ksteele@gmail.com>, hereby add my Signed-off-by to this commit: 8371c060ea
Signed-off-by: Ken Steele <ksteele@gmail.com>
* test(asr): add coverage for MLX options, pipeline helpers, and VLM prompts
- tests/test_asr_mlx_whisper.py: verify explicit MLX options (framework, repo ids)
- tests/test_asr_pipeline.py: cover _has_text/_determine_status and backend support with proper InputDocument/NoOpBackend wiring
- tests/test_interfaces.py: add BaseVlmPageModel.formulate_prompt tests (RAW/NONE/CHAT, invalid style), with minimal InlineVlmOptions scaffold
Improves reliability of ASR and VLM components by validating configuration paths and helper logic.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* test(asr): broaden coverage for model selection, pipeline flows, and VLM prompts
- tests/test_asr_mlx_whisper.py
- Add MLX/native selector coverage across all Whisper sizes
- Validate repo_id choices under MLX and Native paths
- Cover fallback path when MPS unavailable and mlx_whisper missing
- tests/test_asr_pipeline.py
- Relax silent-audio assertion to accept PARTIAL_SUCCESS or SUCCESS
- Force CPU native path in helper tests to avoid torch in device selection
- Add language handling tests for native/MLX transcribe
- Cover native run success (BytesIO) and failure (exception) branches
- Cover MLX run success/failure branches with mocked transcribe
- Add init path coverage with artifacts_path
- tests/test_interfaces.py
- Add focused VLM prompt tests (NONE/CHAT variants)
Result: all tests passing with significantly improved coverage for ASR model selectors, pipeline execution paths, and VLM prompt formulation.
Signed-off-by: Ken Steele <ksteele@gmail.com>
* simplify ASR model settings (no pipeline detection needed)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* clean up disk space in runners
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Ken Steele <ksteele@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* Export of DrawingML figures into docling document
* Adding libreoffice env var and libreoffice to checks image
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* DCO Remediation Commit for Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
I, Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>, hereby add my Signed-off-by to this commit: 9518fffcad
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Enforcing apt get update
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Only display drawingml warning once per document
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* add util to test libreoffice and exclude files from test when not found
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* check libreoffice only once
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Only initialise converter if needed
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* Experimental code for repetition detection, VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM inference code, CLI and VLM specs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix generation and decoder args for HF model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix vllm device args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bugfixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove streaming VLLM for the moment
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add repetition StoppingCriteria for GraniteDocling/SmolDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make GenerationStopper base class and port for MLX
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add streaming support and custom GenerationStopper support for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix api_image_request_streaming when GenerationStopper triggers.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move DocTagsRepetitionStopper to utility unit, update examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* updated the README
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added minimal_asr_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Updated README and added ASR example
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Updated docs.index.md
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated CI and mkdocs
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added link tp existing audio file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added link tp existing audio file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatting
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* build: Add ollama sdk dependency
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Add option plumbing for OllamaVlmOptions in pipeline_options
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Full implementation of OllamaVlmModel
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Connect "granite_vision_ollama" pipeline option to CLI
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Revert "build: Add ollama sdk dependency"
After consideration, we're going to use the generic OpenAI API instead
of the Ollama-specific API to avoid duplicate work.
This reverts commit bc6b366468cdd66b52540aac9c7d8b584ab48ad0.
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Move OpenAI API call logic into utils.utils
This will allow reuse of this logic in a generic VLM model
NOTE: There is a subtle change here in the ordering of the text prompt and
the image in the call to the OpenAI API. When run against Ollama, this
ordering makes a big difference. If the prompt comes before the image, the
result is terse and not usable whereas the prompt coming after the image
works as expected and matches the non-OpenAI chat API.
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Refactor from Ollama SDK to generic OpenAI API
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Linting, formatting, and bug fixes
The one bug fix was in the timeout arg to openai_image_request. Otherwise,
this is all style changes to get MyPy and black passing cleanly.
Branch: OllamaVlmModel
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* remove model from download enum
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* generalize input args for other API providers
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename and refactor
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* require flag for remote services
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* disable example from CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add examples to docs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
* Skeleton for SmolDocling model and VLM Pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* wip smolDocling inference and vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixes to preserve page image and demo export to html
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Enabled figure support in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix for table span compute in vlm_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added tokens/sec measurement, improved example
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added capability for vlm_pipeline to grab text from preconfigured backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Exposed "force_backend_text" as pipeline parameter
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Flipped keep_backend to True for vlm_pipeline assembly to work
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated vlm pipeline assembly and smol docling model code to support updated doctags
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fixing doctags starting tag, that broke elements on first line during assembly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models.
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated example of Smol Docling usage
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Update minimal smoldocling example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix repo id
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleaned up unnecessary logging
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* More elegant solution in removing the input prompt
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed minimal_smol_docling example from CI checks
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Removed special html code wrapping when exporting to docling document, cleaned up comments
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Moved keep_backend = True to vlm pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added example on how to get original predicted doctags in minimal_smol_docling
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* removing changes from base_pipeline
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Replaced remaining strings to appropriate enums
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Updated poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* re-built poetry.lock
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Generalize and refactor VLM pipeline and models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Rename example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move imports
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose control over using flash_attention_2
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix VLM example exclusion in CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add back device_map and accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make drawing code resilient against bad bboxes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: clean up code and comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: more cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* chore: fix leftover .to(device)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fix: add proper table provenance
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
* docs: Introduce example with custom models for RapidOCR
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* chore: Exclude the example with custom RapidOCR models from the examples to run in github actions
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* test: update results with new docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update all deps in the lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix table in test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix version for python3.13
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* latest poetry version in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* activate py3.13 in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update docs about python 3.13
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* test with rapidocr only on python <3.13
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add the pytests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* renamed the test folder and added the toplevel test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the toplevel function test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to start running all tests successfully
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the reference converted documents
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added first test for json and md output
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ran pre-commit
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* replaced deprecated json function with model_dump_json
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* replaced deprecated json function with model_dump_json
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Fix backend tests
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* commented out the drawing
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ci: avoid duplicate runs
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* commented out json verification for now
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added verification of input cells
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformat code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added test to verify the cells in the pages (3)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* run all examples in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* make sure examples return failures
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* raise a failure if examples fail
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix examples
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* run examples after tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Add tests and update top_level_tests using only datamodels
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove unnecessary code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Validate conversion status on e2e test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* package verify utils and add more tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* reduce docs in example, since they are already in the tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip batch_convert
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin docling-parse 1.1.2
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* updated the error messages
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* commented out the json verification for now
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* bumped GLM version
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* Fix lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Pin new docling-parse v1.1.3
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>