fix: fix duplicate title and heading + add e2e tests for html and docx (#186)

* add real e2e tests for html and docx

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the output of itxt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the text

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the examples (1)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the output of the test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the tests, moved the ground-truth

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* moved the ground-truth data

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the html tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restructure title fix (#187)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Peter W. J. Staar
2024-10-30 13:14:56 +01:00
committed by GitHub
parent dda2645d4c
commit f542460af3
49 changed files with 13733 additions and 57 deletions

View File

@@ -0,0 +1,16 @@
<html>
<body>
<h1>Introduction</h1>
<p>This is the first paragraph of the introduction.</p>
<h2>Background</h2>
<p>Some background information here.</p>
<ul>
<li>First item in unordered list</li>
<li>Second item in unordered list</li>
</ul>
<ol>
<li>First item in ordered list</li>
<li>Second item in ordered list</li>
</ol>
</body>
</html>