Cesar Berrospi Ramis
|
106951e71e
|
test: add missing ground truth files (#1667)
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-05-28 13:26:49 +02:00 |
|
Cesar Berrospi Ramis
|
776e7ecf9a
|
fix(HTML): handle row spans in header rows (#1536)
* chore(HTML): log the stacktrace of errors
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* fix(HTML): handle row headers like in pivot tables
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-05-09 15:14:32 +02:00 |
|
Cesar Berrospi Ramis
|
ed20124544
|
fix(html): handle address, details, and summary tags (#1436)
* fix(html): handle 'address' tag
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* fix(html): handle 'details' tag
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-04-23 09:30:59 +02:00 |
|
Cesar Berrospi Ramis
|
f94da44ec5
|
fix(html): handle nested empty lists (#1154)
Run Docs CD / build-deploy-docs (push) Failing after 1m20s
Run Docs CI / build-docs (push) Failing after 49s
Address the case of nested lists in empty list items.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-03-13 16:56:58 +01:00 |
|
Cesar Berrospi Ramis
|
1b0ead6907
|
fix(html): Parse text in div elements as TextItem (#1041)
feat(html): Parse text in div elements as TextItem
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-02-24 12:38:29 +01:00 |
|
Cesar Berrospi Ramis
|
a112d7a035
|
fix: parse html with omitted body tag (#818)
* fix: parse HTML files without body tag
Parse HTML files without 'body' tag, since it is optional in HTML5 specification.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
* test: ensure docling converts HTML without body tag
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
---------
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
|
2025-01-27 16:59:00 +01:00 |
|
Peter W. J. Staar
|
f542460af3
|
fix: fix duplicate title and heading + add e2e tests for html and docx (#186)
* add real e2e tests for html and docx
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the output of itxt
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the text
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the tests (2)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the examples (1)
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the output of the test
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the tests, moved the ground-truth
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* moved the ground-truth data
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fixed the html tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* restructure title fix (#187)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-30 13:14:56 +01:00 |
|