Christoph Auer
5c862b5971
Update docling-core pinnings
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 13:49:01 +02:00
Christoph Auer
515ab04947
Update docling-core pinnings
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 13:32:15 +02:00
Michele Dolfi
d5f161d0f5
apply changes to the picture data annotations
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-16 13:24:21 +02:00
Christoph Auer
c123e5a812
Update docling-core pinnings
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-16 11:43:23 +02:00
Michele Dolfi
dd2982cce1
pin models, core and adapt example
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-16 10:57:05 +02:00
Christoph Auer
84438bd8a8
Merge from main and remove conflicts
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 16:22:28 +02:00
Michele Dolfi
f49d7881d0
pin docling-core and glm
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-15 14:35:02 +02:00
Christoph Auer
dac82ca7f2
Import statement updates from docling-core
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-15 10:11:10 +02:00
Christoph Auer
5b33b12660
renaming BaseTableData
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-14 17:01:50 +02:00
Panos Vagenas
d504432c1e
docs: introduce docs site ( #141 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-14 14:13:13 +02:00
Michele Dolfi
245b6c4c01
pin picture data with molecule
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 18:07:43 +02:00
Michele Dolfi
7c8d7e222e
use new PictureData
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-13 16:48:16 +02:00
Christoph Auer
69f0ab419c
Bump docling-core version
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-11 16:55:01 +02:00
Christoph Auer
5e4944f15f
feat: new experimental docling-parse v2 backend ( #131 )
...
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 15:12:49 +02:00
Michele Dolfi
331ab36f04
Merge remote-tracking branch 'origin/main' into cau/input-format-abstraction
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-11 11:23:04 +02:00
Christoph Auer
7aad3dc946
Update test cases for v2
2024-10-10 18:51:19 +02:00
Michele Dolfi
1bcad334f2
pin docling-parse release
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-10 18:30:09 +02:00
Michele Dolfi
bde8186700
update pinning
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-10 17:54:05 +02:00
Michele Dolfi
50c05b262a
pin updates compatible with each other
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-10 17:40:32 +02:00
Christoph Auer
7cad290ceb
Refactor test data, legacy usage and more
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-10 13:54:44 +02:00
Panos Vagenas
5f1bd9e9c8
docs: simplify LlamaIndex example using Docling extension ( #135 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-09 22:17:56 +02:00
Christoph Auer
b5a27386c1
Merge from main, update OCR model and test cases
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-09 16:04:19 +02:00
Panos Vagenas
6924999f1f
chore: explicitly manage pandas dependency ( #134 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-10-09 14:50:39 +02:00
Michele Dolfi
f96ea86a00
feat: add options for choosing OCR engines ( #118 )
...
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
2024-10-08 19:07:08 +02:00
Christoph Auer
c0447206af
Merge from main
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-10-08 14:42:33 +02:00
Maxim Lysak
89e58ca730
Added HTML backend implementation, few improvements for other backends
...
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-08 11:14:44 +02:00
Maxim Lysak
bea9fc22af
Added mspowerpoint backend first implementation, improvements on msword backend
...
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-07 14:55:21 +02:00
Maxim Lysak
2422f706a1
feat: new torch-based docling models ( #120 )
...
---------
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
2024-10-03 18:42:33 +02:00
Michele Dolfi
d44c62d7ce
feat: windows support ( #122 )
...
* feat: windows support
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add Windows in README
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-10-03 14:23:47 +02:00
Christoph Auer
0a86529afb
Repinning
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-30 13:47:22 +02:00
Michele Dolfi
34bd887a7f
fix: allow usage of opencv 4.6.x ( #110 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-27 15:51:43 +02:00
Christoph Auer
d6df76f90b
feat: Support tableformer model choice ( #90 )
...
* Support tableformer model choice
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update datamodel structure
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update docs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add test unit for table options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Ensure import backwards-compatibility for PipelineOptions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update README
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust parameters on custom_convert
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
* Update Dockerfile
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2024-09-26 21:37:08 +02:00
Christoph Auer
9ffd1dc396
Merge from main
2024-09-26 18:06:08 +02:00
Christoph Auer
0ee82a5e78
Bump deepsearch-glm
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 16:05:54 +02:00
Christoph Auer
ad2bd714d4
Update GT test files for pages
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 15:54:55 +02:00
Panos Vagenas
39977b5631
chore: move examples extras to respective group ( #103 )
...
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-25 15:47:48 +02:00
Christoph Auer
3efc2bbbf4
Apply renamings to DocItemLabel
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 12:22:02 +02:00
Michele Dolfi
6a03c208ec
feat: add figure in markdown ( #98 )
...
* feat: add figures in markdown
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update to new docling-core and update test results with figures
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update with improved docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-24 17:28:23 +02:00
Christoph Auer
850a521195
Update lockfile
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 16:26:22 +02:00
Christoph Auer
33373ac0dd
Switch everything to use label enum, and more
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 16:00:39 +02:00
Christoph Auer
867e06f9f2
Merge from main
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 12:05:17 +02:00
Peter W. J. Staar
4794ce460a
fix: updated the render_as_doctags with the new arguments from docling-core ( #93 )
...
* updated the render_as_doctags with the new arguments from docling-core
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* ensuring that docling-core is >1.5.0 to accomodate with the latest export-to-doctags parameters
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the doctags tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the README
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fix poetry lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* Fix formatting problems
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* fixed the doctag export in docling/utils/export.py
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* propagate xsize and ysize
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-23 20:12:18 +02:00
Christoph Auer
d7907310e5
Add exporter methods to new types
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-23 15:04:20 +02:00
Christoph Auer
ac51a09065
Put stub for experimental format export
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-20 11:08:30 +02:00
Christoph Auer
abb6dddea8
Reorganize imports from docling-core
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-20 10:53:52 +02:00
Michele Dolfi
f19bd43798
feat: add table exports ( #86 )
...
* feat: expose docling-core table exporters and add examples
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* remove temp internal implementation of html export
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* pin latest docling-core 1.4.0 with table exports
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-18 08:44:13 +02:00
Peter W. J. Staar
442443a102
fix: bumped the glm version and adjusted the tests ( #83 )
...
* bumped the glm version and adjusted the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* updated the poetry lock
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* fix hooks
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fixed the tests
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added the tests for tables
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-18 07:43:49 +02:00
Nikos Livathinos
fa9699fa3c
fix(tests): Adjust the test data to match the new version of LayoutPredictor ( #82 )
...
* fix(tests): Adjust the test data to match the new version of LayoutPredictor from docling-ibm-models
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
* chore: Update poetry to use `docling-ibm-models` at version `v1.2.0`
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
---------
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2024-09-17 15:50:35 +02:00
Peter W. J. Staar
98990784df
feat: add docling cli ( #75 )
...
* chore: add simple convert script
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted all
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* reformatted all
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* added default arg
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
* use typer for the docling CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* describe output when saving
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add tests for CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add export options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-13 14:03:09 +02:00
Michele Dolfi
8aa476ccd3
test: improve typing definitions (part 1) ( #72 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-12 15:56:29 +02:00