feat: add the Image backend (#2627)

* feat: add the Image backend

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pre-commit

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Fixed single- versus multi-frame image formats

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix: Proper usage of ImageDocumentBackend in the pipeline, deprecate old code.

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Adapt tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: correct mets_gbs backend test

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: Make ImagePageBackend.get_bitmap_rects() yield

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
Peter W. J. Staar
2025-11-17 11:37:22 +01:00
committed by GitHub
parent ae30373ee7
commit 3495b73de8
12 changed files with 494 additions and 82 deletions

View File

@@ -2,6 +2,8 @@ import sys
from pathlib import Path
from typing import List
from pydantic.type_adapter import R
from docling.datamodel.base_models import InputFormat
from docling.datamodel.document import ConversionResult, DoclingDocument
from docling.datamodel.pipeline_options import (
@@ -72,7 +74,9 @@ def test_e2e_webp_conversions():
for webp_path in webp_paths:
print(f"converting {webp_path}")
doc_result: ConversionResult = converter.convert(webp_path)
doc_result: ConversionResult = converter.convert(
webp_path, raises_on_error=True
)
verify_conversion_result_v2(
input_path=webp_path,