Commit Graph

4 Commits

Author SHA1 Message Date
Maxim Lysak
2f72167ff6
feat: updated vlm pipeline (with latest changes from docling-core) (#1158)
* Draft implementation of Doctag backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Updated VLM pipeline doctags to docling conversion, now properly supports lists

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* preparing to migrate to new doctags deserializer

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* re-using DocTagsDocument.from_doctags_and_image_pairs

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* satisfying mypy and other checks

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Added support for force_backend_text parameter

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* removed unnecessary transformation

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Cleaned up

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Update tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updated readme

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2025-03-18 15:44:51 +01:00
Panos Vagenas
0945973b79
fix: use first table row as col headers (#1156)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-03-13 15:34:18 +01:00
Panos Vagenas
94751a78f4
fix(markdown): add support for HTML content (#855)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-02-03 12:21:05 +01:00
Cesar Berrospi Ramis
0cd81a8122
fix(docx): merged table cells not properly converted (#857)
* fix(docx): merged cells not properly converted

Fix conversion issue of merged cells in Word tables leading to repeated text.
Simplify Word table conversion code.
Add docx file with several table formats for regression tests.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

* chore: add type hinting to docx backend

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>

---------

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-02-03 10:20:03 +01:00