feat: Rich tables support for HTML backend (#2324)

* Rich tables support for HTML backend

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Decoupling JATS backend from HTML backend, ways of creating tables changed significantly

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* updated and added tests

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Refactored parse_table_data in html_backend into few smaller functions

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Changing scope of few functions in html_backend.py, making them static, when possible

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

* Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>

---------

Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
Maxim Lysak
2025-09-29 18:12:16 +02:00
committed by GitHub
parent 325877aee9
commit c803abed9a
46 changed files with 9233 additions and 5815 deletions

24
tests/data/html/table_01.html vendored Normal file
View File

@@ -0,0 +1,24 @@
<html>
<head>
<style>
table, th, td {border: 1px solid black; border-collapse: collapse;}
td {padding:30px;}
table {margin: 30px;}
</style>
</head>
<body>
<h1>Header</h1>
<p>This is the first paragraph.</p>
<table>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>1...</td>
<td>2...</td>
</tr>
</table>
After table
</body>
</html>

24
tests/data/html/table_02.html vendored Normal file
View File

@@ -0,0 +1,24 @@
<html>
<head>
<style>
table, th, td {border: 1px solid black; border-collapse: collapse;}
td {padding:30px;}
table {margin: 30px;}
</style>
</head>
<body>
<h1>Header</h1>
<p>This is the first paragraph.</p>
<table>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>First Paragraph<br>Second Paragraph<br>Third Paragraph</td>
<td>2...</td>
</tr>
</table>
After table
</body>
</html>

28
tests/data/html/table_03.html vendored Normal file
View File

@@ -0,0 +1,28 @@
<html>
<head>
<style>
table, th, td {border: 1px solid black; border-collapse: collapse;}
td {padding:30px;}
table {margin: 30px;}
</style>
</head>
<body>
<h1>Header</h1>
<p>This is the first paragraph.</p>
<table>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>
<ul>
<li>First item</li><li>Second item</li><li>Third item</li>
</ul>
</td>
<td>2...</td>
</tr>
</table>
After table
</body>
</html>

29
tests/data/html/table_04.html vendored Normal file
View File

@@ -0,0 +1,29 @@
<html>
<head>
<style>
table, th, td {border: 1px solid black; border-collapse: collapse;}
td {padding:30px;}
table {margin: 30px;}
</style>
</head>
<body>
<h1>Header</h1>
<p>This is the first paragraph.</p>
<table>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>
Some text before list
<ul>
<li>First item</li><li>Second item</li><li>Third item</li>
</ul>
</td>
<td>2...</td>
</tr>
</table>
After table
</body>
</html>

33
tests/data/html/table_05.html vendored Normal file
View File

@@ -0,0 +1,33 @@
<html>
<head>
<style>
table, th, td {border: 1px solid black; border-collapse: collapse;}
td {padding:30px;}
table {margin: 30px;}
</style>
</head>
<body>
<h1>Header</h1>
<p>This is the first paragraph.</p>
<table>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>A1</td><td>B1</td><td>C1</td>
</tr>
<tr>
<td>D1</td><td>E1</td><td>F1</td>
</tr>
</table>
</td>
<td>2...</td>
</tr>
</table>
After table
</body>
</html>

60
tests/data/html/table_06.html vendored Normal file
View File

@@ -0,0 +1,60 @@
<html>
<head>
<style>
table, th, td {border: 1px solid black; border-collapse: collapse;}
td {padding:30px;}
table {margin: 30px;}
</style>
</head>
<body>
<h1>Header</h1>
<p>This is the first paragraph.</p>
<table>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>A1</td><td>B1</td><td>C1</td>
</tr>
<tr>
<td>D1</td>
<td>
<table>
<tr>
<td>I</td><td>II</td>
</tr>
<tr>
<td>III</td><td>IV</td>
</tr>
<tr>
<td>V</td>
<td>
<table>
<tr>
<td>E1</td><td>E2</td>
</tr>
<tr>
<td>E3</td><td>E4</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>VII</td><td>VIII</td>
</tr>
</table>
</td>
<td>F1</td>
</tr>
</table>
</td>
<td>2...</td>
</tr>
</table>
After table
</body>
</html>