feat(docx): Process drawingml objects in docx (#2453)

* Export of DrawingML figures into docling document

* Adding libreoffice env var and libreoffice to checks image

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* DCO Remediation Commit for Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

I, Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>, hereby add my Signed-off-by to this commit: 9518fffcad

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Enforcing apt get update

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* Only display drawingml warning once per document

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

* add util to test libreoffice and exclude files from test when not found

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* check libreoffice only once

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Only initialise converter if needed

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>

---------

Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Rafael Teixeira de Lima
2025-10-15 10:58:08 +02:00
committed by GitHub
parent 3e6da2c62d
commit 16829939cf
8 changed files with 512 additions and 25 deletions

View File

@@ -0,0 +1,13 @@
item-0 at level 0: unspecified: group _root_
item-1 at level 1: section: group textbox
item-2 at level 2: text: Text 2
item-3 at level 2: text: Text 1
item-4 at level 1: picture
item-5 at level 1: text:
item-6 at level 1: text:
item-7 at level 1: text:
item-8 at level 1: text:
item-9 at level 1: text:
item-10 at level 1: text:
item-11 at level 1: text:
item-12 at level 1: picture

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,7 @@
Text 2
Text 1
<!-- image -->
<!-- image -->