fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138)

* feat(OCR tests): Introduce fuzziness in the text validation of OCR tests

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* fix(TesseractOcrCliModel): Send the stderr to devnull to avoid poluting the console with messages from tesseract cmd

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
This commit is contained in:
Nikos Livathinos
2024-10-11 10:21:19 +02:00
committed by GitHub
parent 5f1bd9e9c8
commit dae2a3b667
3 changed files with 50 additions and 17 deletions

View File

@@ -94,5 +94,5 @@ def test_e2e_conversions():
input_path=pdf_path,
doc_result=doc_result,
generate=GENERATE,
skip_cells=True,
fuzzy=True,
)