docs: Update installation options with extras and review FAQ (#2548)

* revise install docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add more FAQ

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2025-10-31 13:21:01 +01:00
committed by GitHub
parent 741c44fa45
commit cb100437fa
2 changed files with 107 additions and 59 deletions

47
docs/faq/index.md vendored
View File

@@ -3,6 +3,13 @@
This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>. This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>.
??? question "Is Python 3.14 supported?"
### Is Python 3.14 supported?
Python 3.14 is supported from Docling 2.59.0.
??? question "Is Python 3.13 supported?" ??? question "Is Python 3.13 supported?"
### Is Python 3.13 supported? ### Is Python 3.13 supported?
@@ -61,14 +68,46 @@ This is a collection of FAQ collected from the user questions on <https://github
Source: Issue [#1694](https://github.com/docling-project/docling/issues/1694). Source: Issue [#1694](https://github.com/docling-project/docling/issues/1694).
??? question "I get this error ImportError: libGL.so.1: cannot open shared object file: No such file or directory"
### I get this error ImportError: libGL.so.1: cannot open shared object file: No such file or directory
This error orginates from conflicting OpenCV distribution in some Docling third-party dependencies.
`opencv-python` and `opencv-python-headless` both define the same python package `cv2` and, if installed together,
this often creates conflicts. Moreover, the `opencv-python` package (which is more common) depends on the OpenGL UI
framework, which is usually not included for headless environments like Docker containers or remote VMs.
When you encouter the error above, you have two possibilities.
Solution 1: Force the headless OpenCV (preferred)
```sh
pip uninstall -y opencv-python opencv-python-headless
pip install --no-cache-dir opencv-python-headless
```
Solution 2: Install the libGL system dependency.
=== "Debian-based"
```console
apt-get install libgl1
```
=== "RHEL / Fedora"
```console
dnf install mesa-libGL
```
??? question "Are text styles (bold, underline, etc) supported?" ??? question "Are text styles (bold, underline, etc) supported?"
### Are text styles (bold, underline, etc) supported? ### Are text styles (bold, underline, etc) supported?
Currently text styles are not supported in the `DoclingDocument` format. Text styles are supported in the `DoclingDocument` format.
If you are interest in contributing this feature, please open a discussion topic to brainstorm on the design. Currently only the declarative backends (i.e. the ones used for docx, pptx, markdown, html, etc) are able to set
the correct text styles. Support for PDF is not yet possible.
_Note: this is not a simple topic_
??? question "How do I run completely offline?" ??? question "How do I run completely offline?"

View File

@@ -20,37 +20,76 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
pip install docling --extra-index-url https://download.pytorch.org/whl/cpu pip install docling --extra-index-url https://download.pytorch.org/whl/cpu
``` ```
??? "Alternative OCR engines" ??? "Installation on macOS Intel (x86_64)"
Docling supports multiple OCR engines for processing scanned documents. The current version provides When installing Docling on macOS with Intel processors, you might encounter errors with PyTorch compatibility.
the following engines. This happens because newer PyTorch versions (2.6.0+) no longer provide wheels for Intel-based Macs.
| Engine | Installation | Usage | If you're using an Intel Mac, install Docling with compatible PyTorch
| ------ | ------------ | ----- | **Note:** PyTorch 2.2.2 requires Python 3.12 or lower. Make sure you're not using Python 3.13+.
| [EasyOCR](https://github.com/JaidedAI/EasyOCR) | Default in Docling or via `pip install easyocr`. | `EasyOcrOptions` |
| Tesseract | System dependency. See description for Tesseract and Tesserocr below. | `TesseractOcrOptions` |
| Tesseract CLI | System dependency. See description below. | `TesseractCliOcrOptions` |
| OcrMac | System dependency. See description below. | `OcrMacOptions` |
| [RapidOCR](https://github.com/RapidAI/RapidOCR) | Extra feature not included in Default Docling installation can be installed via `pip install rapidocr onnxruntime` | `RapidOcrOptions` |
| [OnnxTR](https://github.com/felixdittrich92/OnnxTR) | Can be installed via the plugin system `pip install "docling-ocr-onnxtr[cpu]"`. Please take a look at [docling-OCR-OnnxTR](https://github.com/felixdittrich92/docling-OCR-OnnxTR).| `OnnxtrOcrOptions` |
The Docling `DocumentConverter` allows to choose the OCR engine with the `ocr_options` settings. For example ```bash
# For uv users
uv add torch==2.2.2 torchvision==0.17.2 docling
```python # For pip users
from docling.datamodel.base_models import ConversionStatus, PipelineOptions pip install "docling[mac_intel]"
from docling.datamodel.pipeline_options import PipelineOptions, EasyOcrOptions, TesseractOcrOptions
from docling.document_converter import DocumentConverter
pipeline_options = PipelineOptions() # For Poetry users
pipeline_options.do_ocr = True poetry add docling
pipeline_options.ocr_options = TesseractOcrOptions() # Use Tesseract
doc_converter = DocumentConverter(
pipeline_options=pipeline_options,
)
``` ```
<h3>Tesseract installation</h3> ## Available extras
The `docling` package is designed to offer a working solution for the Docling default options.
Some Docling functionalities require additional third-party packages and are therefore installed only if selected as extras (or installed independently).
The following table summarizes the extras available in the `docling` package. They can be activated with:
`pip install "docling[NAME1,NAME2]"`
| Extra | Description |
| - | - |
| `asr` | Installs dependencies for running the ASR pipeline. |
| `vlm` | Installs dependencies for running the VLM pipeline. |
| `easyocr` | Installs the [EasyOCR](https://github.com/JaidedAI/EasyOCR) OCR engine. |
| `tesserocr` | Installs the Tesseract binding for using it as OCR engine. |
| `ocrmac` | Installs the OcrMac OCR engine. |
| `rapidocr` | Installs the [RapidOCR](https://github.com/RapidAI/RapidOCR) OCR engine with [onnxruntime](https://github.com/microsoft/onnxruntime/) backend. |
### OCR engines
Docling supports multiple OCR engines for processing scanned documents. The current version provides
the following engines.
| Engine | Installation | Usage |
| ------ | ------------ | ----- |
| [EasyOCR](https://github.com/JaidedAI/EasyOCR) | `easyocr` extra or via `pip install easyocr`. | `EasyOcrOptions` |
| Tesseract | System dependency. See description for Tesseract and Tesserocr below. | `TesseractOcrOptions` |
| Tesseract CLI | System dependency. See description below. | `TesseractCliOcrOptions` |
| OcrMac | System dependency. See description below. | `OcrMacOptions` |
| [RapidOCR](https://github.com/RapidAI/RapidOCR) | `rapidocr` extra can or via `pip install rapidocr onnxruntime` | `RapidOcrOptions` |
| [OnnxTR](https://github.com/felixdittrich92/OnnxTR) | Can be installed via the plugin system `pip install "docling-ocr-onnxtr[cpu]"`. Please take a look at [docling-OCR-OnnxTR](https://github.com/felixdittrich92/docling-OCR-OnnxTR).| `OnnxtrOcrOptions` |
The Docling `DocumentConverter` allows to choose the OCR engine with the `ocr_options` settings. For example
```python
from docling.datamodel.base_models import ConversionStatus, PipelineOptions
from docling.datamodel.pipeline_options import PipelineOptions, EasyOcrOptions, TesseractOcrOptions
from docling.document_converter import DocumentConverter
pipeline_options = PipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.ocr_options = TesseractOcrOptions() # Use Tesseract
doc_converter = DocumentConverter(
pipeline_options=pipeline_options,
)
```
??? "Tesseract installation"
[Tesseract](https://github.com/tesseract-ocr/tesseract) is a popular OCR engine which is available [Tesseract](https://github.com/tesseract-ocr/tesseract) is a popular OCR engine which is available
on most operating systems. For using this engine with Docling, Tesseract must be installed on your on most operating systems. For using this engine with Docling, Tesseract must be installed on your
@@ -82,7 +121,7 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}" echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}"
``` ```
<h3>Linking to Tesseract</h3> <h4>Linking to Tesseract</h4>
The most efficient usage of the Tesseract library is via linking. Docling is using The most efficient usage of the Tesseract library is via linking. Docling is using
the [Tesserocr](https://github.com/sirfz/tesserocr) package for this. the [Tesserocr](https://github.com/sirfz/tesserocr) package for this.
@@ -94,36 +133,6 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
pip install --no-binary :all: tesserocr pip install --no-binary :all: tesserocr
``` ```
<h3>ocrmac installation</h3>
[ocrmac](https://github.com/straussmaximilian/ocrmac) is using
Apple's vision(or livetext) framework as OCR backend.
For using this engine with Docling, ocrmac must be installed on your system.
This only works on macOS systems with newer macOS versions (10.15+).
```console
pip install ocrmac
```
??? "Installation on macOS Intel (x86_64)"
When installing Docling on macOS with Intel processors, you might encounter errors with PyTorch compatibility.
This happens because newer PyTorch versions (2.6.0+) no longer provide wheels for Intel-based Macs.
If you're using an Intel Mac, install Docling with compatible PyTorch
**Note:** PyTorch 2.2.2 requires Python 3.12 or lower. Make sure you're not using Python 3.13+.
```bash
# For uv users
uv add torch==2.2.2 torchvision==0.17.2 docling
# For pip users
pip install "docling[mac_intel]"
# For Poetry users
poetry add docling
```
## Development setup ## Development setup
To develop Docling features, bugfixes etc., install as follows from your local clone's root dir: To develop Docling features, bugfixes etc., install as follows from your local clone's root dir: