mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the output of itxt Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the text Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the tests (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the examples (1) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the output of the test Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the tests, moved the ground-truth Signed-off-by: Peter Staar <taa@zurich.ibm.com> * moved the ground-truth data Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the html tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * restructure title fix (#187) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
dda2645d4c
commit
f542460af3
@@ -20,10 +20,10 @@ _log = logging.getLogger(__name__)
|
||||
def main():
|
||||
input_paths = [
|
||||
Path("README.md"),
|
||||
Path("tests/data/wiki_duck.html"),
|
||||
Path("tests/data/word_sample.docx"),
|
||||
Path("tests/data/lorem_ipsum.docx"),
|
||||
Path("tests/data/powerpoint_sample.pptx"),
|
||||
Path("tests/data/html/wiki_duck.html"),
|
||||
Path("tests/data/docx/word_sample.docx"),
|
||||
Path("tests/data/docx/lorem_ipsum.docx"),
|
||||
Path("tests/data/pptx/powerpoint_sample.pptx"),
|
||||
Path("tests/data/2305.03393v1-pg9-img.png"),
|
||||
Path("tests/data/2206.01062.pdf"),
|
||||
Path("tests/data/test_01.asciidoc"),
|
||||
|
||||
Reference in New Issue
Block a user