feat: batching support for VLMs in transformers backend, add initial VLLM backend (#2094)

mirror of https://github.com/DS4SD/docling.git synced 2025-12-08 20:58:11 +00:00

* Prepare existing codes for use with new multi-stage VLM pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add multithreaded VLM pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add VLM task interpreters

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add VLM task interpreters

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Remove prints

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix KeyboardInterrupt behaviour

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add VLLM backend support, optimize process_images

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Tweak defaults

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Implement proper batch inference for HuggingFaceTransformersVlmModel

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Small fixes

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup hf_transformers_model batching impl

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Adjust example instatiation of multi-stage VLM pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add GoT OCR 2.0

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Factor out changes without multi-stage pipeline

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Reset defaults for generation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add torch.compile, fix temperature setting in gen_kwargs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Expose page_batch_size on CLI

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add torch_dtype bfloat16 to SMOLDOCLING and SMOLVLM model spec

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Clip off pad_token

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

This commit is contained in:

Christoph Auer

2025-08-22 13:17:33 +02:00

committed by

GitHub

parent 3f03709885

commit 3c660c0511

17 changed files with 2837 additions and 319 deletions

									
										2

pyproject.toml
									
												View File
												
				@@ -93,6 +93,7 @@ vlm = [

				  'transformers (>=4.46.0,<5.0.0)',

				  'accelerate (>=1.2.1,<2.0.0)',

				  'mlx-vlm (>=0.3.0,<1.0.0) ; python_version >= "3.10" and sys_platform == "darwin" and platform_machine == "arm64"',

				  'vllm (>=0.10.0,<1.0.0) ; python_version >= "3.10" and sys_platform == "linux"',

				]

				rapidocr = [

				  'rapidocr-onnxruntime (>=1.4.0,<2.0.0) ; python_version < "3.13"',

				@@ -252,6 +253,7 @@ module = [

				  "huggingface_hub.*",

				  "transformers.*",

				  "pylatexenc.*",

				  "vllm.*",

				]

				ignore_missing_imports = true

feat: batching support for VLMs in transformers backend, add initial VLLM backend (#2094)

2 pyproject.toml Unescape Escape View File

2

pyproject.toml

View File