Maxim Lysak
c803abed9a
feat: Rich tables support for HTML backend ( #2324 )
...
* Rich tables support for HTML backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Decoupling JATS backend from HTML backend, ways of creating tables changed significantly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* updated and added tests
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Refactored parse_table_data in html_backend into few smaller functions
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Changing scope of few functions in html_backend.py, making them static, when possible
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com >
2025-09-29 18:12:16 +02:00
Panos Vagenas
be26044f14
chore: update docling-core lock ( #2169 )
...
* chore: upgrade docling-core
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* upgrade lock
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-09-01 13:46:10 +02:00
krrome
94fcc46aa9
feat(html): Support formatting tags in HTML texts ( #2111 )
...
* add parsing for formatting tags in HTML backend
Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch >
fix latest tests + wiki_duck result files.
Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch >
* convert _collect_parent_format_tags to staticmethod
Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch >
---------
Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch >
2025-08-22 10:37:34 +02:00
Panos Vagenas
76d2cb76b3
chore: update docling-core lock ( #2110 )
...
* chore: pre-check docling-core 2.45.0
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update -core pinning
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-08-20 16:41:48 +02:00
Cesar Berrospi Ramis
86f70128aa
fix(HTML): replace non-standard Unicode characters ( #2006 )
...
chore(HTML): replace non-standard Unicode characters for beter downstream tasks
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
2025-07-29 11:05:35 +02:00
Cesar Berrospi Ramis
a069b1175b
refactor(HTML): handle text from styled html ( #1960 )
...
* A new HTML backend that handles styled html (ignors it) as well as images.
Images are parsed as placeholders with a caption, if it exists.
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Co-authored-by: vaaale <2428222+vaaale@users.noreply.github.com >
Signed-off-by: Alexander Vaagan <alexander.vaagan@gmail.com >
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Signed-off-by: vaaale <2428222+vaaale@users.noreply.github.com >
* tests(HTML): re-enable test_ordered_lists
Re-enable test_ordered_lists regression test for the HTML backend since
docling-core now supports ordered lists with custom start value.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
---------
Signed-off-by: Alexander Vaagan <alexander.vaagan@gmail.com >
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Signed-off-by: vaaale <2428222+vaaale@users.noreply.github.com >
Co-authored-by: Alexander Vaagan <2428222+vaaale@users.noreply.github.com >
2025-07-22 13:16:31 +02:00