feat: output page images and extracted bbox (#31)

* Add assemble options and example saving pages and figures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add options for different page elements, improve example and flip name of assemble_options

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2024-08-12 18:25:45 +02:00
committed by GitHub
parent 0bf4a43ed5
commit 63d80edca2
3 changed files with 121 additions and 4 deletions

View File

@@ -265,3 +265,9 @@ class PipelineOptions(BaseModel):
do_ocr: bool = False # True: perform OCR, replace programmatic PDF text
table_structure_options: TableStructureOptions = TableStructureOptions()
class AssembleOptions(BaseModel):
keep_page_images: bool = (
False # False: page images are removed in the assemble step
)