mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 12:48:28 +00:00
docs: More GPU results and improvements in the example docs (#2674)
* add more results and improve the example docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * 5070 windows timing Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add reference for cpu-only Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
17
docs/examples/gpu_standard_pipeline.py
vendored
17
docs/examples/gpu_standard_pipeline.py
vendored
@@ -1,3 +1,20 @@
|
|||||||
|
# %% [markdown]
|
||||||
|
#
|
||||||
|
# What this example does
|
||||||
|
# - Run a conversion using the best setup for GPU for the standard pipeline
|
||||||
|
#
|
||||||
|
# Requirements
|
||||||
|
# - Python 3.9+
|
||||||
|
# - Install Docling: `pip install docling`
|
||||||
|
#
|
||||||
|
# How to run
|
||||||
|
# - `python docs/examples/gpu_standard_pipeline.py`
|
||||||
|
#
|
||||||
|
# This example is part of a set of GPU optimization strategies. Read more about it in [GPU support](../../usage/gpu/)
|
||||||
|
#
|
||||||
|
# ## Example code
|
||||||
|
# %%
|
||||||
|
|
||||||
import datetime
|
import datetime
|
||||||
import logging
|
import logging
|
||||||
import time
|
import time
|
||||||
|
|||||||
29
docs/examples/gpu_vlm_pipeline.py
vendored
29
docs/examples/gpu_vlm_pipeline.py
vendored
@@ -1,3 +1,32 @@
|
|||||||
|
# %% [markdown]
|
||||||
|
#
|
||||||
|
# What this example does
|
||||||
|
# - Run a conversion using the best setup for GPU using VLM models
|
||||||
|
#
|
||||||
|
# Requirements
|
||||||
|
# - Python 3.10+
|
||||||
|
# - Install Docling: `pip install docling`
|
||||||
|
# - Install vLLM: `pip install vllm`
|
||||||
|
#
|
||||||
|
# How to run
|
||||||
|
# - `python docs/examples/gpu_vlm_pipeline.py`
|
||||||
|
#
|
||||||
|
# This example is part of a set of GPU optimization strategies. Read more about it in [GPU support](../../usage/gpu/)
|
||||||
|
#
|
||||||
|
# ### Start models with vllm
|
||||||
|
#
|
||||||
|
# ```console
|
||||||
|
# vllm serve ibm-granite/granite-docling-258M \
|
||||||
|
# --host 127.0.0.1 --port 8000 \
|
||||||
|
# --max-num-seqs 512 \
|
||||||
|
# --max-num-batched-tokens 8192 \
|
||||||
|
# --enable-chunked-prefill \
|
||||||
|
# --gpu-memory-utilization 0.9
|
||||||
|
# ```
|
||||||
|
#
|
||||||
|
# ## Example code
|
||||||
|
# %%
|
||||||
|
|
||||||
import datetime
|
import datetime
|
||||||
import logging
|
import logging
|
||||||
import time
|
import time
|
||||||
|
|||||||
46
docs/usage/gpu.md
vendored
46
docs/usage/gpu.md
vendored
@@ -126,18 +126,38 @@ TBA.
|
|||||||
|
|
||||||
## Performance results
|
## Performance results
|
||||||
|
|
||||||
Test data:
|
### Test data
|
||||||
- Number of pages: 192
|
|
||||||
- Number of tables: 95
|
|
||||||
|
|
||||||
Test infrastructure:
|
| | PDF doc | [ViDoRe V3 HR](https://huggingface.co/datasets/vidore/vidore_v3_hr) |
|
||||||
- Instance type: `g6e.2xlarge`
|
| - | - | - |
|
||||||
- CPU: 8 vCPUs, AMD EPYC 7R13
|
| Num docs | 1 | 14 |
|
||||||
- RAM: 64GB
|
| Num pages | 192 | 1110 |
|
||||||
- GPU: NVIDIA L40S 48GB
|
| Num tables | 95 | 258 |
|
||||||
- CUDA Version: 13.0, Driver Version: 580.95.05
|
| Format type | PDF | Parquet of images |
|
||||||
|
|
||||||
| Pipeline | Page efficiency |
|
|
||||||
| - | - |
|
### Test infrastructure
|
||||||
| Standard - Inline | 3.1 pages/second |
|
|
||||||
| VLM - Inference server (GraniteDocling) | 2.4 pages/second |
|
| | g6e.2xlarge | RTX 5090 | RTX 5070 |
|
||||||
|
| - | - | - | - |
|
||||||
|
| Description | AWS instance `g6e.2xlarge` | Linux bare metal machine | Windows 11 bare metal machine |
|
||||||
|
| CPU | 8 vCPUs, AMD EPYC 7R13 | 16 vCPU, AMD Ryzen 7 9800 | 16 vCPU, AMD Ryzen 7 9800 |
|
||||||
|
| RAM | 64GB | 128GB | 64GB |
|
||||||
|
| GPU | NVIDIA L40S 48GB | NVIDIA GeForce RTX 5090 | NVIDIA GeForce RTX 5070 |
|
||||||
|
| CUDA Version | 13.0, driver 580.95.05 | 13.0, driver 580.105.08 | 13.0, driver 581.57 |
|
||||||
|
|
||||||
|
|
||||||
|
### Results
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr><th rowspan="2">Pipeline</th><th colspan="2">g6e.2xlarge</th><th colspan="2">RTX 5090</th><th colspan="2">RTX 5070</th></tr>
|
||||||
|
<tr><th>PDF doc</th><th>ViDoRe V3 HR</th><th>PDF doc</th><th>ViDoRe V3 HR</th><th>PDF doc</th><th>ViDoRe V3 HR</th></tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr><td>Standard - Inline (no OCR)</td><td>3.1 pages/second</td><td>-</td><td>7.9 pages/second<br /><small><em>[cpu-only]* 1.5 pages/second</em></small></td><td>-</td><td>4.2 pages/second<br /><small><em>[cpu-only]* 1.2 pages/second</em></small></td><td>-</td></tr>
|
||||||
|
<tr><td>VLM - Inference server (GraniteDocling)</td><td>2.4 pages/second</td><td>-</td><td>3.8 pages/second</td><td>3.6-4.5 pages/second</td><td>-</td><td>-</td></tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
_* cpu-only timing computed with 16 pytorch threads._
|
||||||
|
|||||||
Reference in New Issue
Block a user