Commit Graph

187 Commits

Author SHA1 Message Date
Christoph Auer
cd06d89c2a Merge branch 'cau/experimental-format' of github.com:DS4SD/docling into cau/input-format-abstraction 2024-09-30 13:47:57 +02:00
Christoph Auer
0a86529afb Repinning
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-30 13:47:22 +02:00
github-actions[bot]
cde671cf34 chore: bump version to 1.16.1 [skip ci] 2024-09-27 14:36:40 +00:00
Michele Dolfi
34bd887a7f
fix: allow usage of opencv 4.6.x (#110)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-27 15:51:43 +02:00
Christoph Auer
91ab382129 Renaming changes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-27 15:20:01 +02:00
Panos Vagenas
c05b692d69
docs: document chunking (#111)
[skip ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-27 11:16:04 +02:00
Christoph Auer
2461b56b84 Import rewrites, adapt to changes in docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-27 09:21:15 +02:00
github-actions[bot]
6760571fe1 chore: bump version to 1.16.0 [skip ci] 2024-09-27 06:21:15 +00:00
Christoph Auer
d6df76f90b
feat: Support tableformer model choice (#90)
* Support tableformer model choice

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update datamodel structure

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update docs

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Cleanup

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add test unit for table options

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Ensure import backwards-compatibility for PipelineOptions

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update README

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Adjust parameters on custom_convert

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Update Dockerfile

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2024-09-26 21:37:08 +02:00
Christoph Auer
9ffd1dc396 Merge from main 2024-09-26 18:06:08 +02:00
Christoph Auer
0ee82a5e78 Bump deepsearch-glm
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 16:05:54 +02:00
Christoph Auer
ba9d115f64 Examples: Don't export experimental output by default
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 15:56:29 +02:00
Christoph Auer
ad2bd714d4 Update GT test files for pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 15:54:55 +02:00
Panos Vagenas
39977b5631
chore: move examples extras to respective group (#103)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-25 15:47:48 +02:00
Christoph Auer
48d8b7bf70 Sync test data from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 12:26:12 +02:00
Christoph Auer
3efc2bbbf4 Apply renamings to DocItemLabel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 12:22:02 +02:00
Christoph Auer
95c539579d [WIP] introducting extra backend abstraction and input formats
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-25 11:17:49 +02:00
github-actions[bot]
3dfd02a7e9 chore: bump version to 1.15.0 [skip ci] 2024-09-24 15:58:16 +00:00
Michele Dolfi
6a03c208ec
feat: add figure in markdown (#98)
* feat: add figures in markdown

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update to new docling-core and update test results with figures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update with improved docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-24 17:28:23 +02:00
Christoph Auer
850a521195 Update lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 16:26:22 +02:00
Christoph Auer
33373ac0dd Switch everything to use label enum, and more
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 16:00:39 +02:00
github-actions[bot]
001d214a13 chore: bump version to 1.14.0 [skip ci] 2024-09-24 13:38:23 +00:00
Panos Vagenas
d96b96c848
fix: fix OCR setting for pypdfium, minor refactor (#102)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-24 14:36:00 +02:00
Christoph Auer
867e06f9f2 Merge from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 12:05:17 +02:00
Christoph Auer
b54956cce6 Add experimental output in glm_model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-24 11:59:33 +02:00
Panos Vagenas
f8f2303348
docs: document CLI, minor README revamp (#100)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-24 09:21:28 +02:00
Panos Vagenas
f555815343
chore: add RAG notebook titles (#101)
[skip ci]

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-24 09:17:46 +02:00
Panos Vagenas
3c46e4266c
feat: add URL support to CLI (#99)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-24 08:47:53 +02:00
github-actions[bot]
c65a01c9b7 chore: bump version to 1.13.1 [skip ci] 2024-09-23 19:04:01 +00:00
Peter W. J. Staar
4794ce460a
fix: updated the render_as_doctags with the new arguments from docling-core (#93)
* updated the render_as_doctags with the new arguments from docling-core

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* ensuring that docling-core is >1.5.0 to accomodate with the latest export-to-doctags parameters

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the doctags tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix poetry lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Fix formatting problems

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fixed the doctag export in docling/utils/export.py

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* propagate xsize and ysize

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-23 20:12:18 +02:00
Maxim Lysak
dce9934a0f
Updated to new, clean vector logo, svg and rendered png are provided (#96)
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
2024-09-23 15:31:21 +02:00
Christoph Auer
d7907310e5 Add exporter methods to new types
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-23 15:04:20 +02:00
Christoph Auer
257d44a84b Merge branch 'main' of github.com:DS4SD/docling into cau/experimental-format 2024-09-23 14:11:45 +02:00
Christoph Auer
12477c8cac Lots of import refactoring
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-23 12:22:49 +02:00
Michele Dolfi
1f4b224ab6
chore: switch to gh apps user (#92)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-20 17:02:27 +02:00
Christoph Auer
ac51a09065 Put stub for experimental format export
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-20 11:08:30 +02:00
Christoph Auer
abb6dddea8 Reorganize imports from docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-09-20 10:53:52 +02:00
github-actions[bot]
6dd1e91c4a chore: bump version to 1.13.0 [skip ci] 2024-09-18 09:26:03 +00:00
Maxim Lysak
0da7519896
docs: updated Docling logo.png with transparent background (#88)
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
2024-09-18 10:39:11 +02:00
Michele Dolfi
f19bd43798
feat: add table exports (#86)
* feat: expose docling-core table exporters and add examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove temp internal implementation of html export

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* pin latest docling-core 1.4.0 with table exports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-18 08:44:13 +02:00
Peter W. J. Staar
442443a102
fix: bumped the glm version and adjusted the tests (#83)
* bumped the glm version and adjusted the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the poetry lock

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fix hooks

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fixed the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the tests for tables

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-18 07:43:49 +02:00
github-actions[bot]
8242bce4fa chore: bump version to 1.12.2 [skip ci] 2024-09-17 16:01:34 +00:00
Nikos Livathinos
fa9699fa3c
fix(tests): Adjust the test data to match the new version of LayoutPredictor (#82)
* fix(tests): Adjust the test data to match the new version of LayoutPredictor from docling-ibm-models

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

* chore: Update poetry to use `docling-ibm-models` at version `v1.2.0`

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
2024-09-17 15:50:35 +02:00
Michele Dolfi
30a0ef69b4
chore: Add PR template (#81)
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2024-09-16 18:36:26 +02:00
github-actions[bot]
f1932fd8c5 chore: bump version to 1.12.1 [skip ci] 2024-09-16 10:58:09 +00:00
Michele Dolfi
2870fdc857
fix: CLI compatibility with python 3.10 and 3.11 (#79)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-16 12:32:45 +02:00
github-actions[bot]
34b2772a2e chore: bump version to 1.12.0 [skip ci] 2024-09-13 12:34:15 +00:00
Peter W. J. Staar
98990784df
feat: add docling cli (#75)
* chore: add simple convert script

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted all

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted all

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added default arg

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* use typer for the docling CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* describe output when saving

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add tests for CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add export options

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-13 14:03:09 +02:00
Michele Dolfi
8aa476ccd3
test: improve typing definitions (part 1) (#72)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-09-12 15:56:29 +02:00
Panos Vagenas
53569a1023
docs: showcase RAG with LlamaIndex and LangChain (#71)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-09-11 15:07:08 +02:00