Commit Graph

245 Commits

Author SHA1 Message Date
Christoph Auer
734d77c8ae Documentation updates, remove DescriptionItem in DoclingDocument init
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 16:05:49 +02:00
Christoph Auer
07206c5b3e Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-16 13:49:04 +02:00
Christoph Auer
5c862b5971 Update docling-core pinnings
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 13:49:01 +02:00
Maxim Lysak
a07a187150 Added and fixed origin for msword and mspowerpoint backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-16 13:32:50 +02:00
Christoph Auer
515ab04947 Update docling-core pinnings
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 13:32:15 +02:00
Michele Dolfi
d5f161d0f5 apply changes to the picture data annotations
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-16 13:24:21 +02:00
Christoph Auer
c123e5a812 Update docling-core pinnings
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 11:43:23 +02:00
Michele Dolfi
dd2982cce1 pin models, core and adapt example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-16 10:57:05 +02:00
Christoph Auer
8a25230240 Update v2 documentation 2024-10-16 10:28:40 +02:00
Christoph Auer
df3ff47914 Add migration instructions to doc (cont)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 19:08:45 +02:00
Michele Dolfi
cd8e3dce76 fix generation of images and adapt examples
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-15 17:43:47 +02:00
Michele Dolfi
75feef259d Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-15 17:10:30 +02:00
Michele Dolfi
1cb11be06f add options to generate images
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-15 17:09:54 +02:00
Christoph Auer
74e0452b6a Add migration instructions to doc (wip)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 17:08:48 +02:00
Christoph Auer
9d15f4d5bf Adjust CI examples path
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 16:36:21 +02:00
Christoph Auer
40bb84d2de Change output folder for examples.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 16:34:45 +02:00
Panos Vagenas
c1794a79e2
add v2 docs placeholder [skip ci] (#145)
* add v2 docs placeholder [skip ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* Remove conflicts

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 16:30:35 +02:00
Christoph Auer
84438bd8a8 Merge from main and remove conflicts
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 16:22:28 +02:00
Christoph Auer
ba9eaf1bd7 CLI and error handling fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 15:58:39 +02:00
Christoph Auer
a66c4ee8eb Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-15 14:58:10 +02:00
Christoph Auer
27f4ed3620 Enable mypy and fix many reported errors
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 14:58:00 +02:00
Michele Dolfi
f49d7881d0 pin docling-core and glm
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-15 14:35:02 +02:00
Maxim Lysak
115435a835 Fixes for lists handling in docx
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-15 14:33:37 +02:00
Christoph Auer
d687f93d52 Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-15 10:52:23 +02:00
Christoph Auer
fa5d972291 Merge remaining changes from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:52:16 +02:00
Panos Vagenas
6b8835b234
switch convert_all output type from Iterable to Iterator (#143)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-15 10:29:45 +02:00
Christoph Auer
dac82ca7f2 Import statement updates from docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:11:10 +02:00
Christoph Auer
8710506072 Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-15 09:50:18 +02:00
Christoph Auer
afafb97b87 Update CLI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 09:50:06 +02:00
Maxim Lysak
aa22fd31db small corrections to pptx 2024-10-15 09:43:06 +02:00
Christoph Auer
5b33b12660 renaming BaseTableData
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 17:01:50 +02:00
Christoph Auer
b964c4bb69 Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-14 16:54:56 +02:00
Christoph Auer
57de8ad63a Fix generate_multimodal_pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 16:52:58 +02:00
Maxim Lysak
98ca58ffd0 added support for enumerated lists
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-14 16:49:19 +02:00
Christoph Auer
3f0b01702b Update example export code
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 16:40:40 +02:00
Christoph Auer
a50ba57a1f Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-10-14 16:36:20 +02:00
Christoph Auer
497ddb34a8 Big refactoring for legacy_document support
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 16:36:11 +02:00
Maxim Lysak
e87bf9ae06 Updated pptx backend, fixes issues with lists, also added more different list cases to example
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-14 16:20:17 +02:00
Panos Vagenas
d504432c1e
docs: introduce docs site (#141)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-14 14:13:13 +02:00
Michele Dolfi
08ab628e75 use self.artifacts_path
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-14 09:03:49 +02:00
Michele Dolfi
ab8f71511b fix artifacts_path via pipeline_options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-14 08:57:15 +02:00
Michele Dolfi
2b1e72d327
refactor: fix type of tesseractocr options (#140)
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2024-10-14 08:40:22 +02:00
Michele Dolfi
245b6c4c01 pin picture data with molecule
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 18:07:43 +02:00
Michele Dolfi
ddb509628e use do_ flag in pipeline_options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:54:46 +02:00
Michele Dolfi
7c8d7e222e use new PictureData
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:48:16 +02:00
Michele Dolfi
c1ed447c21 propagate raises, add enrichment model, some renaming
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:03:19 +02:00
Michele Dolfi
941b51aa3e missing renamed files
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 18:10:45 +02:00
Michele Dolfi
7f10a546d3 Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 17:04:01 +02:00
Michele Dolfi
98f1a4597e rename and refactor *model*
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 16:57:40 +02:00
Christoph Auer
69f0ab419c Bump docling-core version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 16:55:01 +02:00