Files
docling/tests/data/groundtruth/docling_v2/table_06.html.itxt
Maxim Lysak c803abed9a feat: Rich tables support for HTML backend (#2324)
* Rich tables support for HTML backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Decoupling JATS backend from HTML backend, ways of creating tables changed significantly

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* updated and added tests

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Refactored parse_table_data in html_backend into few smaller functions

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Changing scope of few functions in html_backend.py, making them static, when possible

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
2025-09-29 18:12:16 +02:00

11 lines
573 B
Plaintext
Vendored

item-0 at level 0: unspecified: group _root_
item-1 at level 1: title: Header
item-2 at level 2: text: This is the first paragraph.
item-3 at level 2: table with [2x2]
item-4 at level 3: unspecified: group rich_cell_group_4_0_1
item-5 at level 4: table with [2x3]
item-6 at level 5: unspecified: group rich_cell_group_4_0_1
item-7 at level 6: table with [4x2]
item-8 at level 7: unspecified: group rich_cell_group_4_0_2
item-9 at level 8: table with [2x2]
item-10 at level 2: text: After table