mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the output of itxt Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the text Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the tests (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the examples (1) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the output of the test Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the tests, moved the ground-truth Signed-off-by: Peter Staar <taa@zurich.ibm.com> * moved the ground-truth data Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the html tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * restructure title fix (#187) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
dda2645d4c
commit
f542460af3
17
tests/data/html/example_01.html
Normal file
17
tests/data/html/example_01.html
Normal file
@@ -0,0 +1,17 @@
|
||||
<html>
|
||||
<body>
|
||||
<h1>Introduction</h1>
|
||||
<p>This is the first paragraph of the introduction.</p>
|
||||
<h2>Background</h2>
|
||||
<p>Some background information here.</p>
|
||||
<img src="image1.png" alt="Example image"/>
|
||||
<ul>
|
||||
<li>First item in unordered list</li>
|
||||
<li>Second item in unordered list</li>
|
||||
</ul>
|
||||
<ol>
|
||||
<li>First item in ordered list</li>
|
||||
<li>Second item in ordered list</li>
|
||||
</ol>
|
||||
</body>
|
||||
</html>
|
||||
16
tests/data/html/example_02.html
Normal file
16
tests/data/html/example_02.html
Normal file
@@ -0,0 +1,16 @@
|
||||
<html>
|
||||
<body>
|
||||
<h1>Introduction</h1>
|
||||
<p>This is the first paragraph of the introduction.</p>
|
||||
<h2>Background</h2>
|
||||
<p>Some background information here.</p>
|
||||
<ul>
|
||||
<li>First item in unordered list</li>
|
||||
<li>Second item in unordered list</li>
|
||||
</ul>
|
||||
<ol>
|
||||
<li>First item in ordered list</li>
|
||||
<li>Second item in ordered list</li>
|
||||
</ol>
|
||||
</body>
|
||||
</html>
|
||||
66
tests/data/html/example_03.html
Normal file
66
tests/data/html/example_03.html
Normal file
@@ -0,0 +1,66 @@
|
||||
<html>
|
||||
<head>
|
||||
<style>
|
||||
table {
|
||||
border-collapse: collapse; /* Ensures borders don't double up */
|
||||
width: 100%;
|
||||
}
|
||||
th, td {
|
||||
border: 1px solid black; /* Adds a black border around cells */
|
||||
padding: 8px;
|
||||
text-align: left;
|
||||
}
|
||||
th {
|
||||
background-color: #f2f2f2; /* Light gray background for header */
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>Example Document</h1>
|
||||
<h2>Introduction</h2>
|
||||
<p>This is the first paragraph of the introduction.</p>
|
||||
<h2>Background</h2>
|
||||
<p>Some background information here.</p>
|
||||
<ul>
|
||||
<li>First item in unordered list
|
||||
<ul>
|
||||
<li>Nested item 1</li>
|
||||
<li>Nested item 2</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Second item in unordered list</li>
|
||||
</ul>
|
||||
<ol>
|
||||
<li>First item in ordered list
|
||||
<ol>
|
||||
<li>Nested ordered item 1</li>
|
||||
<li>Nested ordered item 2</li>
|
||||
</ol>
|
||||
</li>
|
||||
<li>Second item in ordered list</li>
|
||||
</ol>
|
||||
<h2>Data Table</h2>
|
||||
<table>
|
||||
<tr>
|
||||
<th>Header 1</th>
|
||||
<th>Header 2</th>
|
||||
<th>Header 3</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Row 1, Col 1</td>
|
||||
<td>Row 1, Col 2</td>
|
||||
<td>Row 1, Col 3</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Row 2, Col 1</td>
|
||||
<td>Row 2, Col 2</td>
|
||||
<td>Row 2, Col 3</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Row 3, Col 1</td>
|
||||
<td>Row 3, Col 2</td>
|
||||
<td>Row 3, Col 3</td>
|
||||
</tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
24
tests/data/html/example_04.html
Normal file
24
tests/data/html/example_04.html
Normal file
@@ -0,0 +1,24 @@
|
||||
<html>
|
||||
<body>
|
||||
<h1>Data Table with Rowspan and Colspan</h1>
|
||||
<table>
|
||||
<tr>
|
||||
<th>Header 1</th>
|
||||
<th colspan="2">Header 2 & 3 (colspan)</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2">Row 1 & 2, Col 1 (rowspan)</td>
|
||||
<td>Row 1, Col 2</td>
|
||||
<td>Row 1, Col 3</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td colspan="2">Row 2, Col 2 & 3 (colspan)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Row 3, Col 1</td>
|
||||
<td>Row 3, Col 2</td>
|
||||
<td>Row 3, Col 3</td>
|
||||
</tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
11
tests/data/html/unit_test_01.html
Normal file
11
tests/data/html/unit_test_01.html
Normal file
@@ -0,0 +1,11 @@
|
||||
<html>
|
||||
<body>
|
||||
<h1>Title</h1>
|
||||
<h2>section-1</h2>
|
||||
<h3>section-1.1</h3>
|
||||
<h2>section-2</h2>
|
||||
<h4>section-2.0.1</h4>
|
||||
<h3>section-2.2</h3>
|
||||
<h3>section-2.3</h3>
|
||||
</body>
|
||||
</html>
|
||||
1311
tests/data/html/wiki_duck.html
Normal file
1311
tests/data/html/wiki_duck.html
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user