Christoph Auer
a66c4ee8eb
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-15 14:58:10 +02:00
Christoph Auer
27f4ed3620
Enable mypy and fix many reported errors
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 14:58:00 +02:00
Michele Dolfi
f49d7881d0
pin docling-core and glm
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-15 14:35:02 +02:00
Maxim Lysak
115435a835
Fixes for lists handling in docx
...
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-15 14:33:37 +02:00
Christoph Auer
d687f93d52
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-15 10:52:23 +02:00
Christoph Auer
fa5d972291
Merge remaining changes from main
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:52:16 +02:00
Panos Vagenas
6b8835b234
switch convert_all output type from Iterable to Iterator ( #143 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-15 10:29:45 +02:00
Christoph Auer
dac82ca7f2
Import statement updates from docling-core
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:11:10 +02:00
Christoph Auer
8710506072
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-15 09:50:18 +02:00
Christoph Auer
afafb97b87
Update CLI
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 09:50:06 +02:00
Maxim Lysak
aa22fd31db
small corrections to pptx
2024-10-15 09:43:06 +02:00
Christoph Auer
5b33b12660
renaming BaseTableData
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 17:01:50 +02:00
Christoph Auer
b964c4bb69
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-14 16:54:56 +02:00
Christoph Auer
57de8ad63a
Fix generate_multimodal_pages
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 16:52:58 +02:00
Maxim Lysak
98ca58ffd0
added support for enumerated lists
...
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-14 16:49:19 +02:00
Christoph Auer
3f0b01702b
Update example export code
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 16:40:40 +02:00
Christoph Auer
a50ba57a1f
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-14 16:36:20 +02:00
Christoph Auer
497ddb34a8
Big refactoring for legacy_document support
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 16:36:11 +02:00
Maxim Lysak
e87bf9ae06
Updated pptx backend, fixes issues with lists, also added more different list cases to example
...
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-14 16:20:17 +02:00
Michele Dolfi
08ab628e75
use self.artifacts_path
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-14 09:03:49 +02:00
Michele Dolfi
ab8f71511b
fix artifacts_path via pipeline_options
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-14 08:57:15 +02:00
Michele Dolfi
245b6c4c01
pin picture data with molecule
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 18:07:43 +02:00
Michele Dolfi
ddb509628e
use do_ flag in pipeline_options
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:54:46 +02:00
Michele Dolfi
7c8d7e222e
use new PictureData
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:48:16 +02:00
Michele Dolfi
c1ed447c21
propagate raises, add enrichment model, some renaming
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:03:19 +02:00
Michele Dolfi
941b51aa3e
missing renamed files
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 18:10:45 +02:00
Michele Dolfi
7f10a546d3
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 17:04:01 +02:00
Michele Dolfi
98f1a4597e
rename and refactor *model*
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 16:57:40 +02:00
Christoph Auer
69f0ab419c
Bump docling-core version
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 16:55:01 +02:00
Christoph Auer
2a259b9723
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-11 16:47:20 +02:00
Christoph Auer
6efcf0a5a5
Add image format support to PdfBackend
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 16:47:15 +02:00
Michele Dolfi
6c9f869dc7
fix default _enrich_document
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 16:38:45 +02:00
Michele Dolfi
5b5c99e9da
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-11 16:31:28 +02:00
Michele Dolfi
ca2a96d982
initial refactor iteration
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 16:31:13 +02:00
Christoph Auer
d0fccb9342
Merge from simplify-conv-api
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 15:57:08 +02:00
Christoph Auer
95c1f80087
Change code to use unordered/ordered list, robustifications
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 14:53:38 +02:00
Panos Vagenas
136f16e85a
feat!: simplify conversion API ( #139 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-11 14:52:37 +02:00
Michele Dolfi
753f67a434
fixes
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 13:06:32 +02:00
Michele Dolfi
94b5e1532d
add GlmOptions
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 13:03:38 +02:00
Michele Dolfi
786b89efd9
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-11 12:59:11 +02:00
Michele Dolfi
c6e1471e02
use options objects
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 12:58:59 +02:00
Christoph Auer
3ee97c42b2
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
2024-10-11 12:57:56 +02:00
Christoph Auer
52713f0cf5
Optionally produce legacy_doc
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 12:57:47 +02:00
Michele Dolfi
cc9bcc424d
fix generation enabled
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 11:49:38 +02:00
Michele Dolfi
331ab36f04
Merge remote-tracking branch 'origin/main' into cau/input-format-abstraction
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 11:23:04 +02:00
Christoph Auer
025983f07b
Backend error handling fixes
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 11:18:47 +02:00
github-actions[bot]
2ec39636f0
chore: bump version to 1.19.1 [skip ci]
2024-10-11 08:52:09 +00:00
Christoph Auer
304d16029a
More renaming, design enrichment interface
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 10:21:31 +02:00
Nikos Livathinos
dae2a3b667
fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests ( #138 )
...
* feat(OCR tests): Introduce fuzziness in the text validation of OCR tests
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* fix(TesseractOcrCliModel): Send the stderr to devnull to avoid poluting the console with messages from tesseract cmd
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2024-10-11 10:21:19 +02:00
Michele Dolfi
051beae203
use new interface in minimal example
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 08:30:09 +02:00