fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178)

* Small fix to properly handle trailing inline text in the md backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added proper handling of headers with bold, italic or emphasis

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed print

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Made smarter processing of headers, with arbitrary styling

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated docling-core to 2.2.1

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated tests because of the change in Markdown export in docling-core

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
Maxim Lysak
2024-10-25 18:02:20 +02:00
committed by GitHub
parent 77a89c3334
commit 88c1673057
6 changed files with 279 additions and 255 deletions

View File

@@ -37,7 +37,7 @@ torchvision = [
######################
python = "^3.10"
pydantic = "^2.0.0"
docling-core = "^2.1.0"
docling-core = "^2.2.1"
docling-ibm-models = "^2.0.1"
deepsearch-glm = "^0.26.1"
filetype = "^1.2.0"