mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the output of itxt Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the text Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the tests (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the examples (1) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the output of the test Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the tests, moved the ground-truth Signed-off-by: Peter Staar <taa@zurich.ibm.com> * moved the ground-truth data Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the html tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * restructure title fix (#187) Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com> Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
dda2645d4c
commit
f542460af3
43
tests/data/groundtruth/docling_v2/word_sample.docx.md
Normal file
43
tests/data/groundtruth/docling_v2/word_sample.docx.md
Normal file
@@ -0,0 +1,43 @@
|
||||
Summer activities
|
||||
|
||||
# Swimming in the lake
|
||||
|
||||
Duck
|
||||
|
||||
Figure 1: This is a cute duckling
|
||||
|
||||
## Let’s swim!
|
||||
|
||||
To get started with swimming, first lay down in a water and try not to drown:
|
||||
|
||||
- You can relax and look around
|
||||
- Paddle about
|
||||
- Enjoy summer warmth
|
||||
|
||||
Also, don’t forget:
|
||||
|
||||
- Wear sunglasses
|
||||
- Don’t forget to drink water
|
||||
- Use sun cream
|
||||
|
||||
Hmm, what else…
|
||||
|
||||
### Let’s eat
|
||||
|
||||
After we had a good day of swimming in the lake, it’s important to eat something nice
|
||||
|
||||
I like to eat leaves
|
||||
|
||||
Here are some interesting things a respectful duck could eat:
|
||||
|
||||
| | Food | Calories per portion |
|
||||
|---------|----------------------------------|------------------------|
|
||||
| Leaves | Ash, Elm, Maple | 50 |
|
||||
| Berries | Blueberry, Strawberry, Cranberry | 150 |
|
||||
| Grain | Corn, Buckwheat, Barley | 200 |
|
||||
|
||||
And let’s add another list in the end:
|
||||
|
||||
- Leaves
|
||||
- Berries
|
||||
- Grain
|
||||
Reference in New Issue
Block a user