fix: PermissionError when using tesseract_ocr_cli_model

Make sure that the `tesseract_ocr_cli_model.py` does not open the png image file twice (`tempfile.NamedTemporaryFile` + `high_res_image.save`), and ensure that `_run_tesseract` is executed once the file is no longer open by python. This other results in a "PermissionError: [Errno 13] Permission denied" error on Windows.

Signed-off-by: Gaspard Petit <gaspardpetit@gmail.com>
This commit is contained in:
Gaspard Petit 2024-11-25 09:19:53 -05:00 committed by GitHub
parent d7072b4b56
commit 0d12ad1dcc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -1,4 +1,5 @@
import io
import os
import logging
import tempfile
from subprocess import DEVNULL, PIPE, Popen
@ -130,14 +131,17 @@ class TesseractOcrCliModel(BaseOcrModel):
high_res_image = page._backend.get_page_image(
scale=self.scale, cropbox=ocr_rect
)
with tempfile.NamedTemporaryFile(
suffix=".png", mode="w"
) as image_file:
fname = image_file.name
high_res_image.save(fname)
try:
with tempfile.NamedTemporaryFile(
suffix=".png", mode="w+b", delete=False
) as image_file:
fname = image_file.name
high_res_image.save(image_file)
df = self._run_tesseract(fname)
finally:
if os.path.exists(fname):
os.remove(fname)
# _log.info(df)