mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-25 19:44:34 +00:00
* feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * docs: Add example how to use "auto" language with tesseract OCR engines Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected language is installed in the system and if not fall back to a default option without language. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> --------- Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> |
||
---|---|---|
.. | ||
batch_convert.py | ||
custom_convert.py | ||
develop_formula_understanding.py | ||
develop_picture_enrichment.py | ||
export_figures.py | ||
export_multimodal.py | ||
export_tables.py | ||
full_page_ocr.py | ||
hybrid_chunking.ipynb | ||
index.md | ||
minimal.py | ||
rag_azuresearch.ipynb | ||
rag_haystack.ipynb | ||
rag_langchain.ipynb | ||
rag_llamaindex.ipynb | ||
rag_weaviate.ipynb | ||
retrieval_qdrant.ipynb | ||
run_md.py | ||
run_with_accelerator.py | ||
run_with_formats.py | ||
tesseract_lang_detection.py | ||
translate.py |