docling/tests/data/groundtruth/docling_v2
Peter W. J. Staar f542460af3
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the output of itxt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the text

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the examples (1)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the output of the test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the tests, moved the ground-truth

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* moved the ground-truth data

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the html tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restructure title fix (#187)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-30 13:14:56 +01:00
..
2203.01017v2.doctags.txt feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2203.01017v2.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2203.01017v2.md fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) 2024-10-25 18:02:20 +02:00
2203.01017v2.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2206.01062.doctags.txt feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2206.01062.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2206.01062.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2206.01062.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1-pg9.doctags.txt feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1-pg9.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2305.03393v1-pg9.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2305.03393v1-pg9.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1.doctags.txt feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2305.03393v1.md fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) 2024-10-25 18:02:20 +02:00
2305.03393v1.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
example_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_01.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_01.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
redp5110.doctags.txt feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
redp5110.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
redp5110.md fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) 2024-10-25 18:02:20 +02:00
redp5110.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
redp5695.doctags.txt feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
redp5695.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
redp5695.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
redp5695.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
test_01.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
unit_test_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
word_sample.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
word_sample.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
word_sample.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00