mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-10 05:38:17 +00:00
feat(ocr): auto-detect rotated pages in Tesseract (#1167)
* fix(ocr): tesseract support mis-oriented documents Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): update missing test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): rotate image to the natural orientation before layout prediction Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): move bounding bow rotation util to orientation.py Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): refactor rotation utilities Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * fix(ocr): avoid to swallow tesseract errors causing orientation detection failures Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): revert layout updates Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com> * chore(ocr): update e2e OCR test data * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrCliModel` * chore(ocr): proceed to OCR without rotation when OSD fails in `TesseractOcrModel` * chore(ocr): default `TesseractOcrCliModel._is_auto` to `False` * fix(ocr): fix `TesseractOcrCliModel._is_auto` computation * chore(ocr): improve logging in case of OSD failure in `TesseractOcrCliModel` and `TesseractOcrModel` --------- Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>
This commit is contained in:
@@ -2646,7 +2646,7 @@
|
||||
"b": 102.78223000000003,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9373533129692078,
|
||||
"confidence": 0.9373531937599182,
|
||||
"cells": [
|
||||
{
|
||||
"index": 0,
|
||||
@@ -2686,7 +2686,7 @@
|
||||
"b": 102.78223000000003,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8858681321144104,
|
||||
"confidence": 0.8858677744865417,
|
||||
"cells": [
|
||||
{
|
||||
"index": 1,
|
||||
@@ -2816,7 +2816,7 @@
|
||||
"b": 179.20818999999995,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.957740306854248,
|
||||
"confidence": 0.9577404260635376,
|
||||
"cells": [
|
||||
{
|
||||
"index": 5,
|
||||
@@ -2881,7 +2881,7 @@
|
||||
"b": 255.42400999999995,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9850425124168396,
|
||||
"confidence": 0.98504239320755,
|
||||
"cells": [
|
||||
{
|
||||
"index": 7,
|
||||
@@ -3096,7 +3096,7 @@
|
||||
"b": 327.98218,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9591907262802124,
|
||||
"confidence": 0.9591910243034363,
|
||||
"cells": [
|
||||
{
|
||||
"index": 15,
|
||||
@@ -3280,9 +3280,9 @@
|
||||
"id": 0,
|
||||
"label": "table",
|
||||
"bbox": {
|
||||
"l": 139.6674041748047,
|
||||
"l": 139.66746520996094,
|
||||
"t": 337.5453796386719,
|
||||
"r": 475.00927734375,
|
||||
"r": 475.0093078613281,
|
||||
"b": 469.4945373535156,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
@@ -7852,7 +7852,7 @@
|
||||
"b": 618.3,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9849976301193237,
|
||||
"confidence": 0.9849975109100342,
|
||||
"cells": [
|
||||
{
|
||||
"index": 93,
|
||||
@@ -8184,9 +8184,9 @@
|
||||
"id": 0,
|
||||
"label": "table",
|
||||
"bbox": {
|
||||
"l": 139.6674041748047,
|
||||
"l": 139.66746520996094,
|
||||
"t": 337.5453796386719,
|
||||
"r": 475.00927734375,
|
||||
"r": 475.0093078613281,
|
||||
"b": 469.4945373535156,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
@@ -13582,7 +13582,7 @@
|
||||
"b": 102.78223000000003,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9373533129692078,
|
||||
"confidence": 0.9373531937599182,
|
||||
"cells": [
|
||||
{
|
||||
"index": 0,
|
||||
@@ -13628,7 +13628,7 @@
|
||||
"b": 102.78223000000003,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8858681321144104,
|
||||
"confidence": 0.8858677744865417,
|
||||
"cells": [
|
||||
{
|
||||
"index": 1,
|
||||
@@ -13770,7 +13770,7 @@
|
||||
"b": 179.20818999999995,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.957740306854248,
|
||||
"confidence": 0.9577404260635376,
|
||||
"cells": [
|
||||
{
|
||||
"index": 5,
|
||||
@@ -13841,7 +13841,7 @@
|
||||
"b": 255.42400999999995,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9850425124168396,
|
||||
"confidence": 0.98504239320755,
|
||||
"cells": [
|
||||
{
|
||||
"index": 7,
|
||||
@@ -14062,7 +14062,7 @@
|
||||
"b": 327.98218,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9591907262802124,
|
||||
"confidence": 0.9591910243034363,
|
||||
"cells": [
|
||||
{
|
||||
"index": 15,
|
||||
@@ -14252,9 +14252,9 @@
|
||||
"id": 0,
|
||||
"label": "table",
|
||||
"bbox": {
|
||||
"l": 139.6674041748047,
|
||||
"l": 139.66746520996094,
|
||||
"t": 337.5453796386719,
|
||||
"r": 475.00927734375,
|
||||
"r": 475.0093078613281,
|
||||
"b": 469.4945373535156,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
@@ -19713,7 +19713,7 @@
|
||||
"b": 618.3,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9849976301193237,
|
||||
"confidence": 0.9849975109100342,
|
||||
"cells": [
|
||||
{
|
||||
"index": 93,
|
||||
@@ -20153,7 +20153,7 @@
|
||||
"b": 179.20818999999995,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.957740306854248,
|
||||
"confidence": 0.9577404260635376,
|
||||
"cells": [
|
||||
{
|
||||
"index": 5,
|
||||
@@ -20224,7 +20224,7 @@
|
||||
"b": 255.42400999999995,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9850425124168396,
|
||||
"confidence": 0.98504239320755,
|
||||
"cells": [
|
||||
{
|
||||
"index": 7,
|
||||
@@ -20445,7 +20445,7 @@
|
||||
"b": 327.98218,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9591907262802124,
|
||||
"confidence": 0.9591910243034363,
|
||||
"cells": [
|
||||
{
|
||||
"index": 15,
|
||||
@@ -20635,9 +20635,9 @@
|
||||
"id": 0,
|
||||
"label": "table",
|
||||
"bbox": {
|
||||
"l": 139.6674041748047,
|
||||
"l": 139.66746520996094,
|
||||
"t": 337.5453796386719,
|
||||
"r": 475.00927734375,
|
||||
"r": 475.0093078613281,
|
||||
"b": 469.4945373535156,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
@@ -26096,7 +26096,7 @@
|
||||
"b": 618.3,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9849976301193237,
|
||||
"confidence": 0.9849975109100342,
|
||||
"cells": [
|
||||
{
|
||||
"index": 93,
|
||||
@@ -26440,7 +26440,7 @@
|
||||
"b": 102.78223000000003,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.9373533129692078,
|
||||
"confidence": 0.9373531937599182,
|
||||
"cells": [
|
||||
{
|
||||
"index": 0,
|
||||
@@ -26486,7 +26486,7 @@
|
||||
"b": 102.78223000000003,
|
||||
"coord_origin": "TOPLEFT"
|
||||
},
|
||||
"confidence": 0.8858681321144104,
|
||||
"confidence": 0.8858677744865417,
|
||||
"cells": [
|
||||
{
|
||||
"index": 1,
|
||||
|
||||
Reference in New Issue
Block a user