feat: add "auto" language for TesseractOcr (#759)

* Add "auto" language for TesseractOcr

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Add tesseract-ocr-script-latn installation for the "auto" language

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Modify "auto" language in TesseractOcr to initialize the script readers lazily

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Finalize script readers

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Fix script models prefix for Linux

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

---------

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
This commit is contained in:
Pavel Denisov
2025-01-23 12:40:50 +01:00
committed by GitHub
parent c49b3526fb
commit 8543c22687
3 changed files with 68 additions and 17 deletions

View File

@@ -10,7 +10,7 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Install tesseract
run: sudo apt-get update && sudo apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-spa libleptonica-dev libtesseract-dev pkg-config
run: sudo apt-get update && sudo apt-get install -y tesseract-ocr tesseract-ocr-eng tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-spa tesseract-ocr-script-latn libleptonica-dev libtesseract-dev pkg-config
- name: Set TESSDATA_PREFIX
run: |
echo "TESSDATA_PREFIX=$(dpkg -L tesseract-ocr-eng | grep tessdata$)" >> "$GITHUB_ENV"