mirror of
https://github.com/DS4SD/docling.git
synced 2025-08-01 23:12:20 +00:00
In case an HTML does not have any header tag, all parsed items are placed in DoclingDocument's body content layer. HTML paragraphs ('p' tags) are parsed as text items with paragraph label. Update test ground truth accoring to the changes above. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> |
||
---|---|---|
.. | ||
asciidoc | ||
csv | ||
docx | ||
groundtruth | ||
html | ||
jats | ||
md | ||
pptx | ||
uspto | ||
xlsx | ||
2305.03393v1-pg9-img.png |