feat: pdf backend, table mode as options and artifacts path (#203)

* feat: add more options in the CLI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update CLI docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * expose artifacts-path as argument Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-12-08 20:58:11 +00:00 · 2024-11-04 14:26:05 +01:00
parent af323c04ef
commit 40ad987303
3 changed files with 63 additions and 26 deletions
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -32,30 +32,37 @@ Here are the available options as of this writing (for an up-to-date listing, ru
 ```console
 $ docling --help

- Usage: docling [OPTIONS] source
-
+ Usage: docling [OPTIONS] source                                                                                             
+                                                                                                                             
 ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ *    input_sources      source  PDF files to convert. Can be local file / directory paths or URL. [default: None]         │
 │                                 [required]                                                                                │
 ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ --from                                     [docx|pptx|html|image|pdf]         Specify input formats to convert from.      │
-│                                                                               Defaults to all formats.                    │
-│                                                                               [default: None]                             │
-│ --to                                       [md|json|text|doctags]             Specify output formats. Defaults to         │
-│                                                                               Markdown.                                   │
-│                                                                               [default: None]                             │
-│ --ocr               --no-ocr                                                  If enabled, the bitmap content will be      │
-│                                                                               processed using OCR.                        │
-│                                                                               [default: ocr]                              │
-│ --ocr-engine                               [easyocr|tesseract_cli|tesseract]  The OCR engine to use. [default: easyocr]   │
-│ --abort-on-error    --no-abort-on-error                                       If enabled, the bitmap content will be      │
-│                                                                               processed using OCR.                        │
-│                                                                               [default: no-abort-on-error]                │
-│ --output                                   PATH                               Output directory where results are saved.   │
-│                                                                               [default: .]                                │
-│ --version                                                                     Show version information.                   │
-│ --help                                                                        Show this message and exit.                 │
+│ --from                                     [docx|pptx|html|image|pdf|asciidoc|md]  Specify input formats to convert from. │
+│                                                                                    Defaults to all formats.               │
+│                                                                                    [default: None]                        │
+│ --to                                       [md|json|text|doctags]                  Specify output formats. Defaults to    │
+│                                                                                    Markdown.                              │
+│                                                                                    [default: None]                        │
+│ --ocr               --no-ocr                                                       If enabled, the bitmap content will be │
+│                                                                                    processed using OCR.                   │
+│                                                                                    [default: ocr]                         │
+│ --ocr-engine                               [easyocr|tesseract_cli|tesseract]       The OCR engine to use.                 │
+│                                                                                    [default: easyocr]                     │
+│ --pdf-backend                              [pypdfium2|dlparse_v1|dlparse_v2]       The PDF backend to use.                │
+│                                                                                    [default: dlparse_v1]                  │
+│ --table-mode                               [fast|accurate]                         The mode to use in the table structure │
+│                                                                                    model.                                 │
+│                                                                                    [default: fast]                        │
+│ --abort-on-error    --no-abort-on-error                                            If enabled, the bitmap content will be │
+│                                                                                    processed using OCR.                   │
+│                                                                                    [default: no-abort-on-error]           │
+│ --output                                   PATH                                    Output directory where results are     │
+│                                                                                    saved.                                 │
+│                                                                                    [default: .]                           │
+│ --version                                                                          Show version information.              │
+│ --help                                                                             Show this message and exit.            │
 ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 ```
 </details>