docling/tests/data/docx
Simon Jégou bfcab3d677
Some checks failed
Run Docs CD / build-deploy-docs (push) Failing after 1m27s
Run Docs CI / build-docs (push) Failing after 52s
feat(docx): add text formatting and hyperlink support (#630)
* feat: Enable markdown text formatting for docx

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Fix imports

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use Formatting

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle hyperlink

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle formatting properly for DocItemLabel.PARAGRAPH

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use inline group

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle bullet lists

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Strip elements

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Strip elements

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Run black and mypy

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Handle header and footer

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Use inline_fmt everywhere

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Run precommit

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Address feedback

Signed-off-by: SimJeg <sjegou@nvidia.com>

* Fix add_list_item

Signed-off-by: SimJeg <sjegou@nvidia.com>

* fix minor bugs, mark helper methods internal

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: SimJeg <sjegou@nvidia.com>
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
2025-04-03 15:11:50 +02:00
..
equations.docx feat: equations to latex in MSWord backend (with inline groups) (#1114) 2025-03-13 15:12:22 +01:00
lorem_ipsum.docx fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
tablecell.docx fix: Handling of single-cell tables in DOCX backend (#314) 2024-11-12 15:20:55 +01:00
test_emf_docx.docx fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
unit_test_formatting.docx feat(docx): add text formatting and hyperlink support (#630) 2025-04-03 15:11:50 +02:00
unit_test_headers_numbered.docx fix: Fixed docx import with headers that are also lists (#842) 2025-01-31 10:51:21 +01:00
unit_test_headers.docx fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
word_sample.docx fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
word_tables.docx fix(docx): merged table cells not properly converted (#857) 2025-02-03 10:20:03 +01:00