Files
docling/tests/data/groundtruth/docling_v2/xlsx_04_inflated.xlsx.md
Richard (Huangrui) Chu b66624bfff fix(xlsx): speed up by detecting the true last non-empty row/column (#2404)
* Update msexcel_backend.py

Fix #2307, Follow the instruction of https://github.com/docling-project/docling/issues/2307#issuecomment-3327248503.

Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>

* Update msexcel_backend.py

Fix error

Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>

* Fix linting issues

Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>

* Add test files and data (Signed-off-by: Huangrui Chu <huangrui.chu.1999@gmail.com>)

Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>

* resolve conflict with test_backend_msexecl; update the boundary

Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>

* chore(xlsx): use a dataclass to represent a bounding rectangle in worksheets

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore(xlsx): increase parsing speed by iterating on 'sheet._cells'

Increase the parsing speed of the spreadsheet backend by iterating on 'sheets._cells'
since this is proportional to the number of created cells.
Rename test file to align it to other test files.

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-10-21 08:08:20 +02:00

1.7 KiB
Vendored

first second third
1 5 9
2 4 6
3 3 3
4 2 0
5 1 -3
6 0 -6
col-1 col-2 col-3 col-4
1 2 3 4
2 4 6 8
3 6 9 12
4 8 12 16
5 10 15 20
6 12 18 24
7 14 21 28
8 16 24 32
col-1 col-2 col-3
1 2 3
2 4 6
3 6 9
4 8 12
col-1 col-2 col-3
1 2 3
2 4 6
3 6 9
4 8 12
first header header
first second third
1 2 3
3 4 5
3 6 7
8 9 9
10 9 9
first (f) header (f) header (f)
first (f) second third
1 2 3
3 4 5
3 6 7
8 9 9
10 9 9