mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-27 04:24:45 +00:00
add docs for TESSDATA_PREFIX
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
parent
ea3f720ef5
commit
5bd64779d1
@ -85,23 +85,31 @@ Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectu
|
||||
[Tesseract](https://github.com/tesseract-ocr/tesseract) is a popular OCR engine which is available
|
||||
on most operating systems. For using this engine with Docling, Tesseract must be installed on your
|
||||
system, using the packaging tool of your choice. Below we provide example commands.
|
||||
After installing Tesseract you are expected to provide the path to its language files using the
|
||||
`TESSDATA_PREFIX` environment variable (note that it must terminate with a slash `/`).
|
||||
|
||||
For macOS, we reccomend using [Homebrew](https://brew.sh/).
|
||||
|
||||
```console
|
||||
brew install tesseract leptonica pkg-config
|
||||
TESSDATA_PREFIX=/opt/homebrew/share/tessdata/
|
||||
echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}"
|
||||
```
|
||||
|
||||
For Debian-based systems.
|
||||
|
||||
```console
|
||||
apt-get install tesseract-ocr tesseract-ocr-eng libtesseract-dev libleptonica-dev pkg-config
|
||||
TESSDATA_PREFIX=$(dpkg -L tesseract-ocr-eng | grep tessdata$)
|
||||
echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}"
|
||||
```
|
||||
|
||||
For RHEL systems.
|
||||
|
||||
```console
|
||||
dnf install tesseract tesseract-devel tesseract-langpack-eng leptonica-devel
|
||||
TESSDATA_PREFIX=/usr/share/tesseract/tessdata/
|
||||
echo "Set TESSDATA_PREFIX=${TESSDATA_PREFIX}"
|
||||
```
|
||||
|
||||
#### Linking to Tesseract
|
||||
|
Loading…
Reference in New Issue
Block a user