Commit Graph

4 Commits

Author SHA1 Message Date
Matvei Smirnov
aebe25cf00 fix(html): prevent hierarchy reset in rich table cells (#2716)
* fix(html): restore parents after rich cell walking

Signed-off-by: Matvei Smirnov <vdalekesmirnov@gmail.com>

* fix(html): add table cell context manager, update tests

Signed-off-by: Matvei Smirnov <vdalekesmirnov@gmail.com>

* fix(html): table with heading test data

Signed-off-by: Matvei Smirnov <vdalekesmirnov@gmail.com>

---------

Signed-off-by: Matvei Smirnov <vdalekesmirnov@gmail.com>
2025-12-03 18:52:23 +01:00
Maxim Lysak
c803abed9a feat: Rich tables support for HTML backend (#2324)
* Rich tables support for HTML backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Decoupling JATS backend from HTML backend, ways of creating tables changed significantly

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* updated and added tests

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Refactored parse_table_data in html_backend into few smaller functions

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Changing scope of few functions in html_backend.py, making them static, when possible

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-09-29 18:12:16 +02:00
Panos Vagenas
be26044f14 chore: update docling-core lock (#2169)
* chore: upgrade docling-core

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* upgrade lock

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-09-01 13:46:10 +02:00
krrome
9687297262 feat(html): Support in-line anchor tags in HTML texts (#1659)
* re-implement links for html backend.

Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch>

* fix inline groups in list items. write specific test for find_parent_annotation of _extract_text_and_hyperlink_recursively.

Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch>

* implement hack for images.

Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch>

---------

Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch>
2025-08-18 09:57:16 +02:00