mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat: added excel backend (#334)
* feat: added excel backend Signed-off-by: Peter Staar <taa@zurich.ibm.com> * first msexcel backend Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added tooling for the cli Signed-off-by: Peter Staar <taa@zurich.ibm.com> * first working version for excel parsing of tables Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added proper typing for mypy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added proper typing for mypy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactor EXCEL to XLSX Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the unit tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * ran poetry lock Signed-off-by: Peter Staar <taa@zurich.ibm.com> * adding images to output [WIP] Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the mypy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the msexcel Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the msexcel (2) Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the mypy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added tests for merged cells in excel Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
e6f89d520f
commit
926dfd29d5
10
tests/data/groundtruth/docling_v2/test-01.xlsx.itxt
Normal file
10
tests/data/groundtruth/docling_v2/test-01.xlsx.itxt
Normal file
@@ -0,0 +1,10 @@
|
||||
item-0 at level 0: unspecified: group _root_
|
||||
item-1 at level 1: section: group sheet: Sheet1
|
||||
item-2 at level 2: table with [7x3]
|
||||
item-3 at level 1: section: group sheet: Sheet2
|
||||
item-4 at level 2: table with [9x4]
|
||||
item-5 at level 2: table with [5x3]
|
||||
item-6 at level 2: table with [5x3]
|
||||
item-7 at level 1: section: group sheet: Sheet3
|
||||
item-8 at level 2: table with [7x3]
|
||||
item-9 at level 2: table with [7x3]
|
||||
3240
tests/data/groundtruth/docling_v2/test-01.xlsx.json
Normal file
3240
tests/data/groundtruth/docling_v2/test-01.xlsx.json
Normal file
File diff suppressed because it is too large
Load Diff
51
tests/data/groundtruth/docling_v2/test-01.xlsx.md
Normal file
51
tests/data/groundtruth/docling_v2/test-01.xlsx.md
Normal file
@@ -0,0 +1,51 @@
|
||||
| first | second | third |
|
||||
|----------|-----------|---------|
|
||||
| 1 | 5 | 9 |
|
||||
| 2 | 4 | 6 |
|
||||
| 3 | 3 | 3 |
|
||||
| 4 | 2 | 0 |
|
||||
| 5 | 1 | -3 |
|
||||
| 6 | 0 | -6 |
|
||||
|
||||
| col-1 | col-2 | col-3 | col-4 |
|
||||
|---------|---------|---------|---------|
|
||||
| 1 | 2 | 3 | 4 |
|
||||
| 2 | 4 | 6 | 8 |
|
||||
| 3 | 6 | 9 | 12 |
|
||||
| 4 | 8 | 12 | 16 |
|
||||
| 5 | 10 | 15 | 20 |
|
||||
| 6 | 12 | 18 | 24 |
|
||||
| 7 | 14 | 21 | 28 |
|
||||
| 8 | 16 | 24 | 32 |
|
||||
|
||||
| col-1 | col-2 | col-3 |
|
||||
|---------|---------|---------|
|
||||
| 1 | 2 | 3 |
|
||||
| 2 | 4 | 6 |
|
||||
| 3 | 6 | 9 |
|
||||
| 4 | 8 | 12 |
|
||||
|
||||
| col-1 | col-2 | col-3 |
|
||||
|---------|---------|---------|
|
||||
| 1 | 2 | 3 |
|
||||
| 2 | 4 | 6 |
|
||||
| 3 | 6 | 9 |
|
||||
| 4 | 8 | 12 |
|
||||
|
||||
| first | header | header |
|
||||
|----------|----------|----------|
|
||||
| first | second | third |
|
||||
| 1 | 2 | 3 |
|
||||
| 3 | 4 | 5 |
|
||||
| 3 | 6 | 7 |
|
||||
| 8 | 9 | 9 |
|
||||
| 10 | 9 | 9 |
|
||||
|
||||
| first (f) | header (f) | header (f) |
|
||||
|-------------|--------------|--------------|
|
||||
| first (f) | second | third |
|
||||
| 1 | 2 | 3 |
|
||||
| 3 | 4 | 5 |
|
||||
| 3 | 6 | 7 |
|
||||
| 8 | 9 | 9 |
|
||||
| 10 | 9 | 9 |
|
||||
BIN
tests/data/xlsx/test-01.xlsx
Normal file
BIN
tests/data/xlsx/test-01.xlsx
Normal file
Binary file not shown.
Reference in New Issue
Block a user