feat: add support for ocrmac OCR engine on macOS (#276)

* feat: add support for `ocrmac` OCR engine on macOS

- Integrates `ocrmac` as an OCR engine option for macOS users.
- Adds configuration options and dependencies for `ocrmac`.
- Updates documentation to reflect new engine support.

This change allows macOS users to utilize `ocrmac` for improved OCR performance and compatibility.

Signed-off-by: Suhwan Seo <nuridol@gmail.com>

* updated the poetry lock

Signed-off-by: Suhwan Seo <nuridol@gmail.com>

* Fix linting issues, update CLI docs, and add error for ocrmac use on non-Mac systems

- Resolved formatting and linting issues
- Updated `--ocr-engine` CLI option documentation for `ocrmac`
- Added RuntimeError for attempts to use `ocrmac` on non-Mac platforms

Signed-off-by: Suhwan Seo <nuridol@gmail.com>

* feat: add support for `ocrmac` OCR engine on macOS

- Integrates `ocrmac` as an OCR engine option for macOS users.
- Adds configuration options and dependencies for `ocrmac`.
- Updates documentation to reflect new engine support.

This change allows macOS users to utilize `ocrmac` for improved OCR performance and compatibility.

Signed-off-by: Suhwan Seo <nuridol@gmail.com>

* docs: update examples and installation for ocrmac support

- Added `OcrMacOptions` to `custom_convert.py` and `full_page_ocr.py` examples.
- Included usage comments and examples for `OcrMacOptions` in OCR pipelines.
- Updated installation guide to include instructions for installing `ocrmac`, noting macOS version requirements (10.15+).
- Highlighted that `ocrmac` leverages Apple's Vision framework as an OCR backend.

This enhances documentation for users working on macOS to leverage `ocrmac` effectively.

Signed-off-by: Suhwan Seo <nuridol@gmail.com>

* fix: update `ocrmac` dependency with macOS-specific marker

- Added `sys_platform == 'darwin'` marker to the `ocrmac` dependency in `pyproject.toml` to specify macOS compatibility.
- Updated the content hash in `poetry.lock` to reflect the changes.

This ensures the `ocrmac` dependency is only installed on macOS systems.

Signed-off-by: Suhwan Seo <nuridol@gmail.com>

---------

Signed-off-by: Suhwan Seo <nuridol@gmail.com>
Co-authored-by: Suhwan Seo <nuridol@gmail.com>
This commit is contained in:
nuridol
2024-11-20 20:51:19 +09:00
committed by GitHub
parent 32ebf55e33
commit 6efa96c983
10 changed files with 311 additions and 14 deletions

View File

@@ -48,6 +48,7 @@ beautifulsoup4 = "^4.12.3"
pandas = "^2.1.4"
marko = "^2.1.2"
openpyxl = "^3.1.5"
ocrmac = { version = "^1.0.0", markers = "sys_platform == 'darwin'", optional = true }
[tool.poetry.group.dev.dependencies]
black = {extras = ["jupyter"], version = "^24.4.2"}
@@ -95,6 +96,7 @@ torchvision = [
[tool.poetry.extras]
tesserocr = ["tesserocr"]
ocrmac = ["ocrmac"]
[tool.poetry.scripts]
docling = "docling.cli.main:app"
@@ -130,6 +132,7 @@ module = [
"tesserocr.*",
"docling_ibm_models.*",
"easyocr.*",
"ocrmac.*",
"deepsearch_glm.*",
"lxml.*",
"bs4.*",