docs: fix examples rendering (#2281)

fix examples rendering

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
This commit is contained in:
Panos Vagenas
2025-09-18 02:50:50 +02:00
committed by GitHub
parent f1687fb09b
commit 8322c2ea9b
2 changed files with 25 additions and 27 deletions

View File

@@ -1,32 +1,32 @@
""" # %% [markdown]
Batch convert multiple PDF files and export results in several formats. # Batch convert multiple PDF files and export results in several formats.
What this example does # What this example does
- Loads a small set of sample PDFs. # - Loads a small set of sample PDFs.
- Runs the Docling PDF pipeline once per file. # - Runs the Docling PDF pipeline once per file.
- Writes outputs to `scratch/` in multiple formats (JSON, HTML, Markdown, text, doctags, YAML). # - Writes outputs to `scratch/` in multiple formats (JSON, HTML, Markdown, text, doctags, YAML).
Prerequisites # Prerequisites
- Install Docling and dependencies as described in the repository README. # - Install Docling and dependencies as described in the repository README.
- Ensure you can import `docling` from your Python environment. # - Ensure you can import `docling` from your Python environment.
# - YAML export requires `PyYAML` (`pip install pyyaml`). # <!-- YAML export requires `PyYAML` (`pip install pyyaml`). -->
Input documents # Input documents
- By default, this example uses a few PDFs from `tests/data/pdf/` in the repo. # - By default, this example uses a few PDFs from `tests/data/pdf/` in the repo.
- If you cloned without test data, or want to use your own files, edit # - If you cloned without test data, or want to use your own files, edit
`input_doc_paths` below to point to PDFs on your machine. # `input_doc_paths` below to point to PDFs on your machine.
Output formats (controlled by flags) # Output formats (controlled by flags)
- `USE_V2 = True` enables the current Docling document exports (recommended). # - `USE_V2 = True` enables the current Docling document exports (recommended).
- `USE_LEGACY = False` keeps legacy Deep Search exports disabled. # - `USE_LEGACY = False` keeps legacy Deep Search exports disabled.
You can set it to `True` if you need legacy formats for compatibility tests. # You can set it to `True` if you need legacy formats for compatibility tests.
Notes # Notes
- Set `pipeline_options.generate_page_images = True` to include page images in HTML. # - Set `pipeline_options.generate_page_images = True` to include page images in HTML.
- The script logs conversion progress and raises if any documents fail. # - The script logs conversion progress and raises if any documents fail.
# - This example shows both helper methods like `save_as_*` and lower-level # <!-- This example shows both helper methods like `save_as_*` and lower-level
# `export_to_*` + manual file writes; outputs may overlap intentionally. # `export_to_*` + manual file writes; outputs may overlap intentionally. -->
""" # %%
import json import json
import logging import logging

View File

@@ -1,7 +1,4 @@
# %% [markdown] # %% [markdown]
# Simple conversion: one document to Markdown
# ==========================================
#
# What this example does # What this example does
# - Converts a single source (URL or local file path) to a unified Docling # - Converts a single source (URL or local file path) to a unified Docling
# document and prints Markdown to stdout. # document and prints Markdown to stdout.
@@ -17,6 +14,7 @@
# Notes # Notes
# - The converter auto-detects supported formats (PDF, DOCX, HTML, PPTX, images, etc.). # - The converter auto-detects supported formats (PDF, DOCX, HTML, PPTX, images, etc.).
# - For batch processing or saving outputs to files, see `docs/examples/batch_convert.py`. # - For batch processing or saving outputs to files, see `docs/examples/batch_convert.py`.
# %%
from docling.document_converter import DocumentConverter from docling.document_converter import DocumentConverter