mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-26 03:55:00 +00:00
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> |
||
---|---|---|
.. | ||
2203.01017v2.doctags.txt | ||
2203.01017v2.json | ||
2203.01017v2.md | ||
2203.01017v2.pages.json | ||
2206.01062.doctags.txt | ||
2206.01062.json | ||
2206.01062.md | ||
2206.01062.pages.json | ||
2305.03393v1-pg9.doctags.txt | ||
2305.03393v1-pg9.json | ||
2305.03393v1-pg9.md | ||
2305.03393v1-pg9.pages.json | ||
2305.03393v1.doctags.txt | ||
2305.03393v1.json | ||
2305.03393v1.md | ||
2305.03393v1.pages.json | ||
amt_handbook_sample.doctags.txt | ||
amt_handbook_sample.json | ||
amt_handbook_sample.md | ||
amt_handbook_sample.pages.json | ||
code_and_formula.doctags.txt | ||
code_and_formula.json | ||
code_and_formula.md | ||
code_and_formula.pages.json | ||
multi_page.doctags.txt | ||
multi_page.json | ||
multi_page.md | ||
multi_page.pages.json | ||
picture_classification.doctags.txt | ||
picture_classification.json | ||
picture_classification.md | ||
picture_classification.pages.json | ||
redp5110_sampled.doctags.txt | ||
redp5110_sampled.json | ||
redp5110_sampled.md | ||
redp5110_sampled.pages.json | ||
right_to_left_01.doctags.txt | ||
right_to_left_01.json | ||
right_to_left_01.md | ||
right_to_left_01.pages.json | ||
right_to_left_02.doctags.txt | ||
right_to_left_02.json | ||
right_to_left_02.md | ||
right_to_left_02.pages.json | ||
right_to_left_03.doctags.txt | ||
right_to_left_03.json | ||
right_to_left_03.md | ||
right_to_left_03.pages.json |