mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-09 05:08:14 +00:00
feat: Rich tables support for HTML backend (#2324)
* Rich tables support for HTML backend Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Decoupling JATS backend from HTML backend, ways of creating tables changed significantly Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * updated and added tests Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Refactored parse_table_data in html_backend into few smaller functions Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Changing scope of few functions in html_backend.py, making them static, when possible Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
7
tests/data/groundtruth/docling_v2/table_05.html.itxt
vendored
Normal file
7
tests/data/groundtruth/docling_v2/table_05.html.itxt
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
item-0 at level 0: unspecified: group _root_
|
||||
item-1 at level 1: title: Header
|
||||
item-2 at level 2: text: This is the first paragraph.
|
||||
item-3 at level 2: table with [2x2]
|
||||
item-4 at level 3: unspecified: group rich_cell_group_2_0_1
|
||||
item-5 at level 4: table with [2x3]
|
||||
item-6 at level 2: text: After table
|
||||
Reference in New Issue
Block a user