docling/docling
Pavel Denisov 8543c22687
feat: add "auto" language for TesseractOcr (#759)
* Add "auto" language for TesseractOcr

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Add tesseract-ocr-script-latn installation for the "auto" language

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Modify "auto" language in TesseractOcr to initialize the script readers lazily

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Finalize script readers

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

* Fix script models prefix for Linux

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>

---------

Signed-off-by: Pavel Denisov <pavel.denisov@iais.fraunhofer.de>
2025-01-23 12:40:50 +01:00
..
backend refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
chunking feat: expose new hybrid chunker, update docs (#384) 2024-12-09 08:28:29 +01:00
cli feat: added http header support for document converter and cli (#642) 2025-01-07 10:15:14 +01:00
datamodel refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
models feat: add "auto" language for TesseractOcr (#759) 2025-01-23 12:40:50 +01:00
pipeline refactor: allow the usage of backends in the enrich models and generalize the interface (#742) 2025-01-15 09:52:38 +01:00
utils feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
__init__.py Initial commit 2024-07-15 09:42:42 +02:00
document_converter.py feat: added http header support for document converter and cli (#642) 2025-01-07 10:15:14 +01:00
exceptions.py fix: improve handling of disallowed formats (#429) 2024-12-03 12:45:32 +01:00
py.typed fix: Add py.typed marker file (#531) 2024-12-06 13:42:14 +01:00