mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-09 05:08:14 +00:00
feat: Rich tables support for HTML backend (#2324)
* Rich tables support for HTML backend Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Decoupling JATS backend from HTML backend, ways of creating tables changed significantly Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * updated and added tests Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Refactored parse_table_data in html_backend into few smaller functions Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Changing scope of few functions in html_backend.py, making them static, when possible Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
9
tests/data/groundtruth/docling_v2/table_04.html.md
vendored
Normal file
9
tests/data/groundtruth/docling_v2/table_04.html.md
vendored
Normal file
@@ -0,0 +1,9 @@
|
||||
# Header
|
||||
|
||||
This is the first paragraph.
|
||||
|
||||
| A | B |
|
||||
|----------------------------------------------------------------|------|
|
||||
| Some text before list - First item - Second item - Third item | 2... |
|
||||
|
||||
After table
|
||||
Reference in New Issue
Block a user