feat: Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status (#47)

* Put safety-checks for failed parse of pages

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Introduce page-level error checks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Bump to docling-parse 1.1.1

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Introduce page-level error checks

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
Christoph Auer
2024-08-23 16:18:41 +02:00
committed by GitHub
parent 3226b20779
commit a294b7e64a
7 changed files with 92 additions and 30 deletions

View File

@@ -16,7 +16,7 @@ class ConversionStatus(str, Enum):
STARTED = auto()
FAILURE = auto()
SUCCESS = auto()
SUCCESS_WITH_ERRORS = auto()
PARTIAL_SUCCESS = auto()
class DocInputType(str, Enum):
@@ -29,6 +29,18 @@ class CoordOrigin(str, Enum):
BOTTOMLEFT = auto()
class DoclingComponentType(str, Enum):
PDF_BACKEND = auto()
MODEL = auto()
DOC_ASSEMBLER = auto()
class ErrorItem(BaseModel):
component_type: DoclingComponentType
module_name: str
error_message: str
class PageSize(BaseModel):
width: float = 0.0
height: float = 0.0