docling/tests/data/groundtruth/docling_v2/word_sample.docx.itxt
Peter W. J. Staar f542460af3
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the output of itxt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the text

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the tests (2)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the examples (1)

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the output of the test

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the tests, moved the ground-truth

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* moved the ground-truth data

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the html tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restructure title fix (#187)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-30 13:14:56 +01:00

29 lines
1.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

item-0 at level 0: unspecified: group _root_
item-1 at level 1: paragraph: Summer activities
item-2 at level 1: title: Swimming in the lake
item-3 at level 2: paragraph: Duck
item-4 at level 2: paragraph:
item-5 at level 2: paragraph: Figure 1: This is a cute duckling
item-6 at level 2: section_header: Lets swim!
item-7 at level 3: paragraph: To get started with swimming, fi ... down in a water and try not to drown:
item-8 at level 3: list: group list
item-9 at level 4: list_item: You can relax and look around
item-10 at level 4: list_item: Paddle about
item-11 at level 4: list_item: Enjoy summer warmth
item-12 at level 3: paragraph: Also, dont forget:
item-13 at level 3: list: group list
item-14 at level 4: list_item: Wear sunglasses
item-15 at level 4: list_item: Dont forget to drink water
item-16 at level 4: list_item: Use sun cream
item-17 at level 3: paragraph: Hmm, what else…
item-18 at level 3: section_header: Lets eat
item-19 at level 4: paragraph: After we had a good day of swimm ... , its important to eat something nice
item-20 at level 4: paragraph: I like to eat leaves
item-21 at level 4: paragraph: Here are some interesting things a respectful duck could eat:
item-22 at level 4: table with [4x3]
item-23 at level 4: paragraph:
item-24 at level 4: paragraph: And lets add another list in the end:
item-25 at level 4: list: group list
item-26 at level 5: list_item: Leaves
item-27 at level 5: list_item: Berries
item-28 at level 5: list_item: Grain