mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat(docx): Process drawingml objects in docx (#2453)
* Export of DrawingML figures into docling document
* Adding libreoffice env var and libreoffice to checks image
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* DCO Remediation Commit for Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
I, Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>, hereby add my Signed-off-by to this commit: 9518fffcad
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Enforcing apt get update
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* Only display drawingml warning once per document
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
* add util to test libreoffice and exclude files from test when not found
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* check libreoffice only once
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Only initialise converter if needed
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
---------
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
3e6da2c62d
commit
16829939cf
13
tests/data/groundtruth/docling_v2/drawingml.docx.itxt
vendored
Normal file
13
tests/data/groundtruth/docling_v2/drawingml.docx.itxt
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
item-0 at level 0: unspecified: group _root_
|
||||
item-1 at level 1: section: group textbox
|
||||
item-2 at level 2: text: Text 2
|
||||
item-3 at level 2: text: Text 1
|
||||
item-4 at level 1: picture
|
||||
item-5 at level 1: text:
|
||||
item-6 at level 1: text:
|
||||
item-7 at level 1: text:
|
||||
item-8 at level 1: text:
|
||||
item-9 at level 1: text:
|
||||
item-10 at level 1: text:
|
||||
item-11 at level 1: text:
|
||||
item-12 at level 1: picture
|
||||
250
tests/data/groundtruth/docling_v2/drawingml.docx.json
vendored
Normal file
250
tests/data/groundtruth/docling_v2/drawingml.docx.json
vendored
Normal file
File diff suppressed because one or more lines are too long
7
tests/data/groundtruth/docling_v2/drawingml.docx.md
vendored
Normal file
7
tests/data/groundtruth/docling_v2/drawingml.docx.md
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
Text 2
|
||||
|
||||
Text 1
|
||||
|
||||
<!-- image -->
|
||||
|
||||
<!-- image -->
|
||||
Reference in New Issue
Block a user