mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-09 05:08:14 +00:00
feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
This commit is contained in:
@@ -2498,9 +2498,9 @@
|
||||
{
|
||||
"bbox": [
|
||||
148.45364379882812,
|
||||
366.1538391113281,
|
||||
366.1537780761719,
|
||||
464.3608093261719,
|
||||
583.6257476806641
|
||||
583.6257629394531
|
||||
],
|
||||
"page": 2,
|
||||
"span": [
|
||||
@@ -2541,9 +2541,9 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
164.6503143310547,
|
||||
164.65028381347656,
|
||||
511.6590576171875,
|
||||
449.550537109375,
|
||||
449.5505676269531,
|
||||
628.2029113769531
|
||||
],
|
||||
"page": 7,
|
||||
@@ -2563,7 +2563,7 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
140.70960998535156,
|
||||
140.70968627929688,
|
||||
198.32281494140625,
|
||||
472.73382568359375,
|
||||
283.9361572265625
|
||||
@@ -2585,10 +2585,10 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
162.67434692382812,
|
||||
128.786376953125,
|
||||
451.70068359375,
|
||||
347.3774719238281
|
||||
162.67430114746094,
|
||||
128.78643798828125,
|
||||
451.70062255859375,
|
||||
347.37744140625
|
||||
],
|
||||
"page": 10,
|
||||
"span": [
|
||||
@@ -2607,9 +2607,9 @@
|
||||
"prov": [
|
||||
{
|
||||
"bbox": [
|
||||
168.3928985595703,
|
||||
168.39285278320312,
|
||||
157.99432373046875,
|
||||
447.3513488769531,
|
||||
447.35137939453125,
|
||||
610.0334930419922
|
||||
],
|
||||
"page": 11,
|
||||
@@ -4065,7 +4065,7 @@
|
||||
143.6376495361328,
|
||||
528.7375183105469,
|
||||
470.8485412597656,
|
||||
635.6522827148438
|
||||
635.6522979736328
|
||||
],
|
||||
"page": 10,
|
||||
"span": [
|
||||
|
||||
Reference in New Issue
Block a user