mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 12:48:28 +00:00
docs: Update installation options with extras and review FAQ (#2548)
* revise install docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add more FAQ Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
47
docs/faq/index.md
vendored
47
docs/faq/index.md
vendored
@@ -3,6 +3,13 @@
|
|||||||
This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>.
|
This is a collection of FAQ collected from the user questions on <https://github.com/docling-project/docling/discussions>.
|
||||||
|
|
||||||
|
|
||||||
|
??? question "Is Python 3.14 supported?"
|
||||||
|
|
||||||
|
### Is Python 3.14 supported?
|
||||||
|
|
||||||
|
Python 3.14 is supported from Docling 2.59.0.
|
||||||
|
|
||||||
|
|
||||||
??? question "Is Python 3.13 supported?"
|
??? question "Is Python 3.13 supported?"
|
||||||
|
|
||||||
### Is Python 3.13 supported?
|
### Is Python 3.13 supported?
|
||||||
@@ -61,14 +68,46 @@ This is a collection of FAQ collected from the user questions on <https://github
|
|||||||
Source: Issue [#1694](https://github.com/docling-project/docling/issues/1694).
|
Source: Issue [#1694](https://github.com/docling-project/docling/issues/1694).
|
||||||
|
|
||||||
|
|
||||||
|
??? question "I get this error ImportError: libGL.so.1: cannot open shared object file: No such file or directory"
|
||||||
|
|
||||||
|
### I get this error ImportError: libGL.so.1: cannot open shared object file: No such file or directory
|
||||||
|
|
||||||
|
This error orginates from conflicting OpenCV distribution in some Docling third-party dependencies.
|
||||||
|
`opencv-python` and `opencv-python-headless` both define the same python package `cv2` and, if installed together,
|
||||||
|
this often creates conflicts. Moreover, the `opencv-python` package (which is more common) depends on the OpenGL UI
|
||||||
|
framework, which is usually not included for headless environments like Docker containers or remote VMs.
|
||||||
|
|
||||||
|
When you encouter the error above, you have two possibilities.
|
||||||
|
|
||||||
|
Solution 1: Force the headless OpenCV (preferred)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip uninstall -y opencv-python opencv-python-headless
|
||||||
|
pip install --no-cache-dir opencv-python-headless
|
||||||
|
```
|
||||||
|
|
||||||
|
Solution 2: Install the libGL system dependency.
|
||||||
|
|
||||||
|
=== "Debian-based"
|
||||||
|
|
||||||
|
```console
|
||||||
|
apt-get install libgl1
|
||||||
|
```
|
||||||
|
|
||||||
|
=== "RHEL / Fedora"
|
||||||
|
|
||||||
|
```console
|
||||||
|
dnf install mesa-libGL
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
??? question "Are text styles (bold, underline, etc) supported?"
|
??? question "Are text styles (bold, underline, etc) supported?"
|
||||||
|
|
||||||
### Are text styles (bold, underline, etc) supported?
|
### Are text styles (bold, underline, etc) supported?
|
||||||
|
|
||||||
Currently text styles are not supported in the `DoclingDocument` format.
|
Text styles are supported in the `DoclingDocument` format.
|
||||||
If you are interest in contributing this feature, please open a discussion topic to brainstorm on the design.
|
Currently only the declarative backends (i.e. the ones used for docx, pptx, markdown, html, etc) are able to set
|
||||||
|
the correct text styles. Support for PDF is not yet possible.
|
||||||
_Note: this is not a simple topic_
|
|
||||||
|
|
||||||
|
|
||||||
??? question "How do I run completely offline?"
|
??? question "How do I run completely offline?"
|
||||||
|
|||||||
79
docs/installation/index.md
vendored
79
docs/installation/index.md
vendored
@@ -20,18 +20,57 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
|
|||||||
pip install docling --extra-index-url https://download.pytorch.org/whl/cpu
|
pip install docling --extra-index-url https://download.pytorch.org/whl/cpu
|
||||||
```
|
```
|
||||||
|
|
||||||
??? "Alternative OCR engines"
|
??? "Installation on macOS Intel (x86_64)"
|
||||||
|
|
||||||
|
When installing Docling on macOS with Intel processors, you might encounter errors with PyTorch compatibility.
|
||||||
|
This happens because newer PyTorch versions (2.6.0+) no longer provide wheels for Intel-based Macs.
|
||||||
|
|
||||||
|
If you're using an Intel Mac, install Docling with compatible PyTorch
|
||||||
|
**Note:** PyTorch 2.2.2 requires Python 3.12 or lower. Make sure you're not using Python 3.13+.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# For uv users
|
||||||
|
uv add torch==2.2.2 torchvision==0.17.2 docling
|
||||||
|
|
||||||
|
# For pip users
|
||||||
|
pip install "docling[mac_intel]"
|
||||||
|
|
||||||
|
# For Poetry users
|
||||||
|
poetry add docling
|
||||||
|
```
|
||||||
|
|
||||||
|
## Available extras
|
||||||
|
|
||||||
|
The `docling` package is designed to offer a working solution for the Docling default options.
|
||||||
|
Some Docling functionalities require additional third-party packages and are therefore installed only if selected as extras (or installed independently).
|
||||||
|
|
||||||
|
The following table summarizes the extras available in the `docling` package. They can be activated with:
|
||||||
|
`pip install "docling[NAME1,NAME2]"`
|
||||||
|
|
||||||
|
|
||||||
|
| Extra | Description |
|
||||||
|
| - | - |
|
||||||
|
| `asr` | Installs dependencies for running the ASR pipeline. |
|
||||||
|
| `vlm` | Installs dependencies for running the VLM pipeline. |
|
||||||
|
| `easyocr` | Installs the [EasyOCR](https://github.com/JaidedAI/EasyOCR) OCR engine. |
|
||||||
|
| `tesserocr` | Installs the Tesseract binding for using it as OCR engine. |
|
||||||
|
| `ocrmac` | Installs the OcrMac OCR engine. |
|
||||||
|
| `rapidocr` | Installs the [RapidOCR](https://github.com/RapidAI/RapidOCR) OCR engine with [onnxruntime](https://github.com/microsoft/onnxruntime/) backend. |
|
||||||
|
|
||||||
|
|
||||||
|
### OCR engines
|
||||||
|
|
||||||
|
|
||||||
Docling supports multiple OCR engines for processing scanned documents. The current version provides
|
Docling supports multiple OCR engines for processing scanned documents. The current version provides
|
||||||
the following engines.
|
the following engines.
|
||||||
|
|
||||||
| Engine | Installation | Usage |
|
| Engine | Installation | Usage |
|
||||||
| ------ | ------------ | ----- |
|
| ------ | ------------ | ----- |
|
||||||
| [EasyOCR](https://github.com/JaidedAI/EasyOCR) | Default in Docling or via `pip install easyocr`. | `EasyOcrOptions` |
|
| [EasyOCR](https://github.com/JaidedAI/EasyOCR) | `easyocr` extra or via `pip install easyocr`. | `EasyOcrOptions` |
|
||||||
| Tesseract | System dependency. See description for Tesseract and Tesserocr below. | `TesseractOcrOptions` |
|
| Tesseract | System dependency. See description for Tesseract and Tesserocr below. | `TesseractOcrOptions` |
|
||||||
| Tesseract CLI | System dependency. See description below. | `TesseractCliOcrOptions` |
|
| Tesseract CLI | System dependency. See description below. | `TesseractCliOcrOptions` |
|
||||||
| OcrMac | System dependency. See description below. | `OcrMacOptions` |
|
| OcrMac | System dependency. See description below. | `OcrMacOptions` |
|
||||||
| [RapidOCR](https://github.com/RapidAI/RapidOCR) | Extra feature not included in Default Docling installation can be installed via `pip install rapidocr onnxruntime` | `RapidOcrOptions` |
|
| [RapidOCR](https://github.com/RapidAI/RapidOCR) | `rapidocr` extra can or via `pip install rapidocr onnxruntime` | `RapidOcrOptions` |
|
||||||
| [OnnxTR](https://github.com/felixdittrich92/OnnxTR) | Can be installed via the plugin system `pip install "docling-ocr-onnxtr[cpu]"`. Please take a look at [docling-OCR-OnnxTR](https://github.com/felixdittrich92/docling-OCR-OnnxTR).| `OnnxtrOcrOptions` |
|
| [OnnxTR](https://github.com/felixdittrich92/OnnxTR) | Can be installed via the plugin system `pip install "docling-ocr-onnxtr[cpu]"`. Please take a look at [docling-OCR-OnnxTR](https://github.com/felixdittrich92/docling-OCR-OnnxTR).| `OnnxtrOcrOptions` |
|
||||||
|
|
||||||
The Docling `DocumentConverter` allows to choose the OCR engine with the `ocr_options` settings. For example
|
The Docling `DocumentConverter` allows to choose the OCR engine with the `ocr_options` settings. For example
|
||||||
@@ -50,7 +89,7 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
<h3>Tesseract installation</h3>
|
??? "Tesseract installation"
|
||||||
|
|
||||||
[Tesseract](https://github.com/tesseract-ocr/tesseract) is a popular OCR engine which is available
|
[Tesseract](https://github.com/tesseract-ocr/tesseract) is a popular OCR engine which is available
|
||||||
on most operating systems. For using this engine with Docling, Tesseract must be installed on your
|
on most operating systems. For using this engine with Docling, Tesseract must be installed on your
|
||||||
@@ -82,7 +121,7 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
|
|||||||
echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}"
|
echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}"
|
||||||
```
|
```
|
||||||
|
|
||||||
<h3>Linking to Tesseract</h3>
|
<h4>Linking to Tesseract</h4>
|
||||||
The most efficient usage of the Tesseract library is via linking. Docling is using
|
The most efficient usage of the Tesseract library is via linking. Docling is using
|
||||||
the [Tesserocr](https://github.com/sirfz/tesserocr) package for this.
|
the [Tesserocr](https://github.com/sirfz/tesserocr) package for this.
|
||||||
|
|
||||||
@@ -94,36 +133,6 @@ Works on macOS, Linux, and Windows, with support for both x86_64 and arm64 archi
|
|||||||
pip install --no-binary :all: tesserocr
|
pip install --no-binary :all: tesserocr
|
||||||
```
|
```
|
||||||
|
|
||||||
<h3>ocrmac installation</h3>
|
|
||||||
|
|
||||||
[ocrmac](https://github.com/straussmaximilian/ocrmac) is using
|
|
||||||
Apple's vision(or livetext) framework as OCR backend.
|
|
||||||
For using this engine with Docling, ocrmac must be installed on your system.
|
|
||||||
This only works on macOS systems with newer macOS versions (10.15+).
|
|
||||||
|
|
||||||
```console
|
|
||||||
pip install ocrmac
|
|
||||||
```
|
|
||||||
|
|
||||||
??? "Installation on macOS Intel (x86_64)"
|
|
||||||
|
|
||||||
When installing Docling on macOS with Intel processors, you might encounter errors with PyTorch compatibility.
|
|
||||||
This happens because newer PyTorch versions (2.6.0+) no longer provide wheels for Intel-based Macs.
|
|
||||||
|
|
||||||
If you're using an Intel Mac, install Docling with compatible PyTorch
|
|
||||||
**Note:** PyTorch 2.2.2 requires Python 3.12 or lower. Make sure you're not using Python 3.13+.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# For uv users
|
|
||||||
uv add torch==2.2.2 torchvision==0.17.2 docling
|
|
||||||
|
|
||||||
# For pip users
|
|
||||||
pip install "docling[mac_intel]"
|
|
||||||
|
|
||||||
# For Poetry users
|
|
||||||
poetry add docling
|
|
||||||
```
|
|
||||||
|
|
||||||
## Development setup
|
## Development setup
|
||||||
|
|
||||||
To develop Docling features, bugfixes etc., install as follows from your local clone's root dir:
|
To develop Docling features, bugfixes etc., install as follows from your local clone's root dir:
|
||||||
|
|||||||
Reference in New Issue
Block a user