Richard (Huangrui) Chu
|
b66624bfff
|
fix(xlsx): speed up by detecting the true last non-empty row/column (#2404)
* Update msexcel_backend.py
Fix #2307, Follow the instruction of https://github.com/docling-project/docling/issues/2307#issuecomment-3327248503.
Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
* Update msexcel_backend.py
Fix error
Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
* Fix linting issues
Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
* Add test files and data (Signed-off-by: Huangrui Chu <huangrui.chu.1999@gmail.com>)
Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
* resolve conflict with test_backend_msexecl; update the boundary
Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
* chore(xlsx): use a dataclass to represent a bounding rectangle in worksheets
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
* chore(xlsx): increase parsing speed by iterating on 'sheet._cells'
Increase the parsing speed of the spreadsheet backend by iterating on 'sheets._cells'
since this is proportional to the number of created cells.
Rename test file to align it to other test files.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
---------
Signed-off-by: Richard (Huangrui) Chu <65276824+HuangruiChu@users.noreply.github.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
|
2025-10-21 08:08:20 +02:00 |
|