Commit Graph

18 Commits

Author SHA1 Message Date
Christoph Auer
ba9eaf1bd7 CLI and error handling fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 15:58:39 +02:00
Christoph Auer
a66c4ee8eb Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-15 14:58:10 +02:00
Christoph Auer
27f4ed3620 Enable mypy and fix many reported errors
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 14:58:00 +02:00
Maxim Lysak
115435a835 Fixes for lists handling in docx
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-15 14:33:37 +02:00
Christoph Auer
dac82ca7f2 Import statement updates from docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:11:10 +02:00
Christoph Auer
5b33b12660 renaming BaseTableData
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 17:01:50 +02:00
Michele Dolfi
7c8d7e222e use new PictureData
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:48:16 +02:00
Christoph Auer
6efcf0a5a5 Add image format support to PdfBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 16:47:15 +02:00
Christoph Auer
95c1f80087 Change code to use unordered/ordered list, robustifications
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 14:53:38 +02:00
Christoph Auer
025983f07b Backend error handling fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 11:18:47 +02:00
Maxim Lysak
da0700f959 Fixes for docx backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-09 16:52:44 +02:00
Christoph Auer
0dfbd0b6fc Update examples and test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-09 15:20:27 +02:00
Christoph Auer
c0447206af Merge from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-08 14:42:33 +02:00
Maxim Lysak
89e58ca730 Added HTML backend implementation, few improvements for other backends
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-08 11:14:44 +02:00
Maxim Lysak
bea9fc22af Added mspowerpoint backend first implementation, improvements on msword backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-07 14:55:21 +02:00
Maxim Lysak
1346843301 Improved docx parsing
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-07 13:00:50 +02:00
Maxim Lysak
cefc34e8d8 Working on a first version of DOCX native backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-04 18:19:40 +02:00
Christoph Auer
1fa7cd9855 Fundamental refactoring for multi-format support
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-01 16:54:09 +02:00