mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat: Rich tables support for HTML backend (#2324)
* Rich tables support for HTML backend Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Decoupling JATS backend from HTML backend, ways of creating tables changed significantly Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * updated and added tests Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Refactored parse_table_data in html_backend into few smaller functions Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Changing scope of few functions in html_backend.py, making them static, when possible Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> * Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
This commit is contained in:
24
tests/data/html/table_01.html
vendored
Normal file
24
tests/data/html/table_01.html
vendored
Normal file
@@ -0,0 +1,24 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table, th, td {border: 1px solid black; border-collapse: collapse;}
|
||||
td {padding:30px;}
|
||||
table {margin: 30px;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Header</h1>
|
||||
<p>This is the first paragraph.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A</td>
|
||||
<td>B</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>1...</td>
|
||||
<td>2...</td>
|
||||
</tr>
|
||||
</table>
|
||||
After table
|
||||
</body>
|
||||
</html>
|
||||
24
tests/data/html/table_02.html
vendored
Normal file
24
tests/data/html/table_02.html
vendored
Normal file
@@ -0,0 +1,24 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table, th, td {border: 1px solid black; border-collapse: collapse;}
|
||||
td {padding:30px;}
|
||||
table {margin: 30px;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Header</h1>
|
||||
<p>This is the first paragraph.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A</td>
|
||||
<td>B</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>First Paragraph<br>Second Paragraph<br>Third Paragraph</td>
|
||||
<td>2...</td>
|
||||
</tr>
|
||||
</table>
|
||||
After table
|
||||
</body>
|
||||
</html>
|
||||
28
tests/data/html/table_03.html
vendored
Normal file
28
tests/data/html/table_03.html
vendored
Normal file
@@ -0,0 +1,28 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table, th, td {border: 1px solid black; border-collapse: collapse;}
|
||||
td {padding:30px;}
|
||||
table {margin: 30px;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Header</h1>
|
||||
<p>This is the first paragraph.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A</td>
|
||||
<td>B</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<ul>
|
||||
<li>First item</li><li>Second item</li><li>Third item</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>2...</td>
|
||||
</tr>
|
||||
</table>
|
||||
After table
|
||||
</body>
|
||||
</html>
|
||||
29
tests/data/html/table_04.html
vendored
Normal file
29
tests/data/html/table_04.html
vendored
Normal file
@@ -0,0 +1,29 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table, th, td {border: 1px solid black; border-collapse: collapse;}
|
||||
td {padding:30px;}
|
||||
table {margin: 30px;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Header</h1>
|
||||
<p>This is the first paragraph.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A</td>
|
||||
<td>B</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Some text before list
|
||||
<ul>
|
||||
<li>First item</li><li>Second item</li><li>Third item</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>2...</td>
|
||||
</tr>
|
||||
</table>
|
||||
After table
|
||||
</body>
|
||||
</html>
|
||||
33
tests/data/html/table_05.html
vendored
Normal file
33
tests/data/html/table_05.html
vendored
Normal file
@@ -0,0 +1,33 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table, th, td {border: 1px solid black; border-collapse: collapse;}
|
||||
td {padding:30px;}
|
||||
table {margin: 30px;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Header</h1>
|
||||
<p>This is the first paragraph.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A</td>
|
||||
<td>B</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A1</td><td>B1</td><td>C1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>D1</td><td>E1</td><td>F1</td>
|
||||
</tr>
|
||||
</table>
|
||||
</td>
|
||||
<td>2...</td>
|
||||
</tr>
|
||||
</table>
|
||||
After table
|
||||
</body>
|
||||
</html>
|
||||
60
tests/data/html/table_06.html
vendored
Normal file
60
tests/data/html/table_06.html
vendored
Normal file
@@ -0,0 +1,60 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table, th, td {border: 1px solid black; border-collapse: collapse;}
|
||||
td {padding:30px;}
|
||||
table {margin: 30px;}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Header</h1>
|
||||
<p>This is the first paragraph.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A</td>
|
||||
<td>B</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<table>
|
||||
<tr>
|
||||
<td>A1</td><td>B1</td><td>C1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>D1</td>
|
||||
<td>
|
||||
<table>
|
||||
<tr>
|
||||
<td>I</td><td>II</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>III</td><td>IV</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>V</td>
|
||||
<td>
|
||||
<table>
|
||||
<tr>
|
||||
<td>E1</td><td>E2</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>E3</td><td>E4</td>
|
||||
</tr>
|
||||
</table>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>VII</td><td>VIII</td>
|
||||
</tr>
|
||||
</table>
|
||||
</td>
|
||||
<td>F1</td>
|
||||
</tr>
|
||||
</table>
|
||||
</td>
|
||||
<td>2...</td>
|
||||
</tr>
|
||||
</table>
|
||||
After table
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user