docs: More GPU results and improvements in the example docs (#2674)

* add more results and improve the example docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* 5070 windows timing

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add reference for cpu-only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2025-11-24 15:26:08 +01:00
committed by GitHub
parent 146b4f0535
commit b75c6461f4
3 changed files with 79 additions and 13 deletions

View File

@@ -1,3 +1,32 @@
# %% [markdown]
#
# What this example does
# - Run a conversion using the best setup for GPU using VLM models
#
# Requirements
# - Python 3.10+
# - Install Docling: `pip install docling`
# - Install vLLM: `pip install vllm`
#
# How to run
# - `python docs/examples/gpu_vlm_pipeline.py`
#
# This example is part of a set of GPU optimization strategies. Read more about it in [GPU support](../../usage/gpu/)
#
# ### Start models with vllm
#
# ```console
# vllm serve ibm-granite/granite-docling-258M \
# --host 127.0.0.1 --port 8000 \
# --max-num-seqs 512 \
# --max-num-batched-tokens 8192 \
# --enable-chunked-prefill \
# --gpu-memory-utilization 0.9
# ```
#
# ## Example code
# %%
import datetime
import logging
import time