* Experimental code for repetition detection, VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM inference code, CLI and VLM specs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix generation and decoder args for HF model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix vllm device args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bugfixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove streaming VLLM for the moment
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add repetition StoppingCriteria for GraniteDocling/SmolDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make GenerationStopper base class and port for MLX
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add streaming support and custom GenerationStopper support for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix api_image_request_streaming when GenerationStopper triggers.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move DocTagsRepetitionStopper to utility unit, update examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Prepare existing codes for use with new multi-stage VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add multithreaded VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add VLM task interpreters
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add VLM task interpreters
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove prints
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix KeyboardInterrupt behaviour
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add VLLM backend support, optimize process_images
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Tweak defaults
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Implement proper batch inference for HuggingFaceTransformersVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Small fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup hf_transformers_model batching impl
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust example instatiation of multi-stage VLM pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add GoT OCR 2.0
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Factor out changes without multi-stage pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Reset defaults for generation
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add torch.compile, fix temperature setting in gen_kwargs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Expose page_batch_size on CLI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add torch_dtype bfloat16 to SMOLDOCLING and SMOLVLM model spec
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clip off pad_token
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add ability to preprocess VLM response
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
* Move response decoding to vlm options (requires inheritance to override). Per-page prompt formulation also moved to vlm options to keep api consistent.
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
---------
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
* feat: adding new vlm-models support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* got microsoft/Phi-4-multimodal-instruct to work
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* working on vlm's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the VLM part
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* all working, now serious refacgtoring necessary
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring the download_model
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the formulate_prompt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* pixtral 12b runs via MLX and native transformers
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the VlmPredictionToken
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* refactoring minimal_vlm_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the MyPy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added pipeline_model_specializations file
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* need to get Phi4 working again ...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* finalising last points for vlms support
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the pipeline for Phi4
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* streamlining all code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixing the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the html backend to the VLM pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the static load_from_doctags
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restore stable imports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use AutoModelForVision2Seq for Pixtral and review example (including rename)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove unused value
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* refactor instances of VLM models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* skip compare example in CI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use lowercase and uppercase only
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename pipeline_vlm_model_spec
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* move more argument to options and simplify model init
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add supported_devices
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove not-needed function
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* exclude minimal_vlm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* missing file
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add message for transformers version
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename to specs
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use module import and remove MLX from non-darwin
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove hf_vlm_model and add extra_generation_args
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* use single HF VLM model class
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove torch type
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add docs for vision models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>