mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-26 20:14:47 +00:00
* Keep page.parsed_page.textline_cells and page.cells in sync, including OCR Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make page.parsed_page the only source of truth for text cells Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Small fix Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Correctly compute PDF boxes from pymupdf Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Use different OCR engine order Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add type hints and fix mypy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * One more test fix Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove with pypdfium2_lock from caller sites Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix typing Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com> |
||
---|---|---|
.. | ||
ocr_test_rotated_90.doctags.txt | ||
ocr_test_rotated_90.json | ||
ocr_test_rotated_90.md | ||
ocr_test_rotated_90.pages.json | ||
ocr_test_rotated_180.doctags.txt | ||
ocr_test_rotated_180.json | ||
ocr_test_rotated_180.md | ||
ocr_test_rotated_180.pages.json | ||
ocr_test_rotated_270.doctags.txt | ||
ocr_test_rotated_270.json | ||
ocr_test_rotated_270.md | ||
ocr_test_rotated_270.pages.json | ||
ocr_test_rotated.doctags.txt | ||
ocr_test_rotated.json | ||
ocr_test_rotated.md | ||
ocr_test_rotated.pages.json | ||
ocr_test.doctags.txt | ||
ocr_test.json | ||
ocr_test.md | ||
ocr_test.pages.json |