Commit Graph

151 Commits

Author SHA1 Message Date
Panos Vagenas
8322c2ea9b docs: fix examples rendering (#2281)
fix examples rendering

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-09-17 20:50:50 -04:00
Christoph Auer
17afb664d0 feat: Add granite-docling model (#2272)
* adding granite-docling preview

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated the model specs

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* typo

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use granite-docling and add to the model downloader

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update docs and README

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update final repo_ids for GraniteDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update final repo_ids for GraniteDocling

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix model name in CLI usage example

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Fix VLM model name in README.md

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-17 15:15:49 +02:00
Mingxuan Zhao
ff351fd40c docs: Describe examples (#2262)
* Update .py examples with clearer guidance,
update out of date imports and calls

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>

* Fix minimal.py string error, fix ruff format error

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>

* fix more CI issues

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>

---------

Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com>
2025-09-16 16:00:38 +02:00
Michele Dolfi
2c9123419f feat: enrichment steps on all convert pipelines (incl docx, html, etc) (#2251)
* allow enrichment on all convert pipelines

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* set options in CLI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-09-11 15:09:00 +02:00
Cesar Berrospi Ramis
f8cc545bab docs: add an example of RAG with OpenSearch (#2238)
* docs: add an example of RAG with OpeanSearch

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* chore: pin latest docling-core and update uv.lock

Pin latest version release of docling-core in pyproject.toml
Update the dependencies in uv.lock file
Run the notebook rag_opensearch.ipynb to pick up changes from docling-core

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-09-10 14:37:22 +02:00
Roy Derks
e5cd7020bd docs: Add instructions for using Docling with MCP to README (#2219)
* docs: Add instructions for using Docling with MCP to README

* DCO Remediation Commit for Roy Derks <10717410+royderks@users.noreply.github.com>

Signed-off-by: Roy Derks <roy.derks@ibm.com>

* DCO Remediation Commit for Roy Derks <10717410+royderks@users.noreply.github.com>

I, Roy Derks <10717410+royderks@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 4b9ba1d0ef

Signed-off-by: Roy Derks <roy.derks@ibm.com>

* docs: reorganize documentation on MCP server

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

* docs: align README with documentation index page

Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>

---------

Signed-off-by: Roy Derks <roy.derks@ibm.com>
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
Co-authored-by: Roy Derks <roy.derks@ibm.com>
Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
2025-09-10 10:02:28 +02:00
Tamás Bitai
55f5f3752f docs: Document VLM support requirement in extraction example (#2231)
* docs: Document VLM support requirement in extraction example

* DCO Remediation Commit for Tamás Bitai <bitai.tamas@gmail.com>

I, Tamás Bitai <bitai.tamas@gmail.com>, hereby add my Signed-off-by to this commit: b90defdb77

Signed-off-by: Tamás Bitai <bitai.tamas@gmail.com>

---------

Signed-off-by: Tamás Bitai <bitai.tamas@gmail.com>
2025-09-09 13:45:55 +02:00
Panos Vagenas
a9f41b088e docs: add information extraction example (#2199)
* docs: add information exctraction example

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update README

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* minor typo

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update README

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-09-05 11:27:09 +02:00
Shikhar Bhardwaj
9f0286bcac fix: translation example (#2166)
* fix: translation example

Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>

* Fix translation example formatting

Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>

---------

Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>
2025-09-01 11:04:46 +02:00
Panos Vagenas
96cab6b536 docs: enrich landing pages (#2165)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-08-29 17:19:05 +02:00
geoHeil
3f60a0fa78 feat: Upgrade to RapidOCR 3.x (#2088)
* feat: exploring new version

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* chore: autoformat

DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775

* feat: enable configurable runtime for rapidocr and handle new result better;

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* chore: fix linter

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* chore: use new server model

* chore:  change default engine type to onnx

* chore: tests update for new rapidocr

* fix: rebase from main and fix clashes

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 02f9db85f562e5cdfda40c52fee55cfd4030d70a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a7bcb205faedb881f94a89b3bbd29cb31ccd54f0
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a39482a98cbcff7a825c8321134732af0c65930a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63e9d717fa26951566b02761f3fdfc752c31f805
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ef12a6ec1ea2846a8a8e2e776eeaa59c2a0c4dfe

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 2222d2340387f8d9d66f3ca9d8e21a0945a44e7a
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: bc6a1dc507d7f146ec4797a2d3840414f46ac64d
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 56e0d67da7c57d4b5caf8eaef8dff7056c3efd32
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 871ca21271412006c76acf3c19426140efed3d50
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7b1b77159da729d483a581a86c7309acba1712a7
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a792a714a43e19a91b2b782f54621c1c5efda632

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: d1fed26323ff829b716bc667fe69532839363e45
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 346ec1cad943765f886e5d17fb0a54221124689c
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 4d0bbe5bd6e9f7261b97362ff8823af244267089
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 34a5ad53892a7064a6bf35f890d344d464c78b2f
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 9151959db3ad53535011d1cfdcf9181fdf936bb1
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 8ef5536f2c098826c6c0a05190f8a80614c3f3cb

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com>

I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7e18637a35
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63fb8ff599
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 0cb9444fb8
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 38940d9978
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: b6d461ac42
I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ee55eb3408

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

---------

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
2025-08-25 12:10:33 +02:00
VIktor Kuropiantnyk
cdf079dd06 feat(CLI): Option to download arbitrary HuggingFace model (#2123)
* Added option to docling-tools to download arbitrary HuggingFace model

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* Added note in documentation

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* Removed note on custom artifact path usage from HF download option

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

* Fixed typo

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

---------

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
2025-08-22 15:23:29 +02:00
Maroun Touma
e76298c40d docs: DPK pipeline example using docling library (#2112)
* Notebook showing example on how to use docling transforms in DPK

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* fix HF Token name

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* use %pip instead of pip install jupyter lab

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* run formatter

Signed-off-by: Maroun Touma <touma@us.ibm.com>

* add example to mkdocs and fix typo

Signed-off-by: Maroun Touma <touma@us.ibm.com>

---------

Signed-off-by: Maroun Touma <touma@us.ibm.com>
2025-08-21 10:14:36 +02:00
Panos Vagenas
8996d612aa docs: add Getting Started page (#2113)
* docs: add Getting Started page

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* refactor usage

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* minor renaming

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-08-21 08:44:53 +02:00
Eric Deandrea
76c1fbd6e8 docs: Add docling Quarkus integration (#2083)
* Add docling Quarkus integration

* DCO Remediation Commit for Eric Deandrea <eric.deandrea@ibm.com>

I, Eric Deandrea <eric.deandrea@ibm.com>, hereby add my Signed-off-by to this commit: 86aa0b80f4

Signed-off-by: Eric Deandrea <eric.deandrea@ibm.com>

---------

Signed-off-by: Eric Deandrea <eric.deandrea@ibm.com>
2025-08-18 06:55:51 +02:00
Shkarupa Alex
5f050f94e1 feat(vlm): Ability to preprocess VLM response (#1907)
* Add ability to preprocess VLM response

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* Move response decoding to vlm options (requires inheritance to override). Per-page prompt formulation also moved to vlm options to keep api consistent.

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

---------

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
2025-08-12 15:20:24 +02:00
Panos Vagenas
e2cca931be docs: add Langflow integration (#2068)
* docs: add langflow integration

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* fix link

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-08-11 16:03:29 +02:00
Thomas Vitale
bfda6d34d8 docs: Add Arconia integration (#2061)
Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
2025-08-08 09:35:47 +02:00
Michele Dolfi
7b5f86098d docs: add chat with dosu (#1984)
add chat with dosu

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-07-24 11:07:36 +02:00
Michele Dolfi
90a7cc4bdd docs: enrich existing DoclingDocument (#1969)
add example for enriching an existing doclingdocument

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-07-22 16:20:15 +02:00
Fabiano Franz
5d98bcea1b docs: add documentation for confidence scores (#1912)
* docs: add documentation for confidence scores

Signed-off-by: Fabiano Franz <contact@fabianofranz.com>

* Increase focus on confidence grades, scores are informational only

Signed-off-by: Fabiano Franz <contact@fabianofranz.com>

* Update confidence_scores.md

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Fabiano Franz <contact@fabianofranz.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2025-07-21 10:16:17 +02:00
stephencox-ict
d6d2dbe2f9 docs: Fix typos (#1943)
Fix typos

Signed-off-by: stephencox-ict <scox@ict.co>
2025-07-15 09:51:56 +02:00
Copilot
c5fb353f10 fix: Change granite vision model URL from preview to stable version (#1925)
* Initial plan

* Fix granite vision model URL from preview to stable version

Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com>

* Update to granite vision 3.3

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Update to granite vision 3.3 (2)

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

---------

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com>
2025-07-11 08:46:03 +02:00
geoHeil
a07ba863c4 feat: add image-text-to-text models in transformers (#1772)
* feat(dolphin): add dolphin support

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* rename

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* reformat

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* fix mypy

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>

* add prompt style and examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-07-08 05:54:57 +02:00
VIktor Kuropiantnyk
e25873d557 fix: docs are missing osd packages for tesseract on RHEL (#1905)
Fixed missing packages in the docs on tesseract

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
2025-07-07 17:06:26 +02:00
Shkarupa Alex
b8813eea80 feat(vlm): Dynamic prompts (#1808)
* Unify temperature options for Vlm models

* Dynamic prompt support with example

* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com>

I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: 34d446cb98
I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: 9c595d574f

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* Replace Page with SegmentedPage

* Fix example HF repo link 

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Sign-off

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com>

I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: 1a162066dd

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

* Use lmstudio-community model

Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

* Swap inference engine to LM Studio

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

---------

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2025-07-07 16:58:42 +02:00
Peter W. J. Staar
f3ae3029b8 docs: update readme and add ASR example (#1836)
* updated the README

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added minimal_asr_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Updated README and added ASR example

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* Updated docs.index.md

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* updated CI and mkdocs

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added link tp existing audio file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added link tp existing audio file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatting

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2025-06-23 18:55:16 +02:00
Michele Dolfi
64ac043786 docs: support running examples from root or subfolder (#1816)
support running examples from root or subfolder

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-19 11:10:40 +02:00
Michele Dolfi
0432a31b2f docs: update vlm models api examples with LM Studio (#1759)
update vlm models api examples

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-12 12:58:44 +02:00
Michele Dolfi
49b10e7419 docs: add open webui (#1734)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-10 09:35:20 +02:00
Michele Dolfi
be42b03f9b docs: flash-attn usage and install (#1706)
* docs: flash-attn usage and install

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix link

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-04 11:09:54 +02:00
Michele Dolfi
cdd401847a feat: simplify dependencies, switch to uv (#1700)
* refactor with uv

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* constraints for onnxruntime

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more constraints

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-03 15:18:54 +02:00
Peter W. J. Staar
cfdf4cea25 feat: new vlm-models support (#1570)
* feat: adding new vlm-models support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* got microsoft/Phi-4-multimodal-instruct to work

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* working on vlm's

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the VLM part

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* all working, now serious refacgtoring necessary

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring the download_model

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the formulate_prompt

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* pixtral 12b runs via MLX and native transformers

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the VlmPredictionToken

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* refactoring minimal_vlm_pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the MyPy

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added pipeline_model_specializations file

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* need to get Phi4 working again ...

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* finalising last points for vlms support

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the pipeline for Phi4

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* streamlining all code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* reformatted the code

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixing the tests

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* added the html backend to the VLM pipeline

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* fixed the static load_from_doctags

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* restore stable imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use AutoModelForVision2Seq for Pixtral and review example (including rename)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove unused value

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* refactor instances of VLM models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* skip compare example in CI

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use lowercase and uppercase only

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename pipeline_vlm_model_spec

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move more argument to options and simplify model init

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add supported_devices

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove not-needed function

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* exclude minimal_vlm

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* missing file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add message for transformers version

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename to specs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use module import and remove MLX from non-darwin

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove hf_vlm_model and add extra_generation_args

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use single HF VLM model class

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove torch type

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add docs for vision models

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-06-02 17:01:06 +02:00
Edgar Hipp
11ca4f7a7b docs: fix typo in index.md (#1676)
Signed-off-by: Edgar Hipp <hipp.edg@gmail.com>
2025-06-02 12:35:59 +02:00
Panos Vagenas
7c4c356e76 chore: fix chunking example data link (#1596)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-05-16 08:44:47 +02:00
Panos Vagenas
9f28abf061 docs: add advanced chunking & serialization example (#1589)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-05-14 14:35:07 +02:00
Elwin
12dab0a1e8 feat: support image/webp file type (#1415)
* support image/webp file type

Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com>
Signed-off-by: Elwin <hzywong@gmail.com>

* docs: add webp image format in supported_formats.md

Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com>
Signed-off-by: Elwin <hzywong@gmail.com>

* test: add a test case for `image/webp` file

Signed-off-by: Elwin <hzywong@gmail.com>

* style: apply styling

Signed-off-by: Elwin <hzywong@gmail.com>

* test: update test case of converting `image/webp` file with more ocr engines

Signed-off-by: Elwin <hzywong@gmail.com>

* style: apply styling

Signed-off-by: Elwin <hzywong@gmail.com>

* rename test file

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com>
Signed-off-by: Elwin <hzywong@gmail.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-14 09:47:28 +02:00
Oleg Lavrovsky
844babb390 docs: update links in data_prep_kit (#1559)
Update data_prep_kit.md

The links were broken, since the repository was renamed. I also noticed that PDF2Parquet is now referred to as Docling2Parquet.

Signed-off-by: Oleg Lavrovsky <31819+loleg@users.noreply.github.com>
2025-05-11 20:38:25 +02:00
Panos Vagenas
3220a592e7 docs: add serialization docs, update chunking docs (#1556)
* docs: add serializers docs, update chunking docs

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update notebook to improve MD table rendering

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-05-08 21:43:01 +02:00
nkh0472
a097ccd8d5 chore: typo fix (#1465)
* typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

* chore: typo fix

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>

---------

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>
2025-04-28 08:52:09 +02:00
Emmanuel Ferdman
3afbe6c969 docs: update supported formats guide (#1463)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-04-28 08:51:54 +02:00
Ryan Lin
a2fbbba9f7 feat: add tutorial using Milvus and Docling for RAG pipeline (#1449)
* feat: add milvus rag with docling tutorial

Signed-off-by: Ryan Lin <linjinhong@yandex.com>

* chore: run pre-commit

Signed-off-by: Ryan Lin <linjinhong@yandex.com>

* feat: add RAG with Milvus example to mkdocs

Signed-off-by: Ryan Lin <linjinhong@yandex.com>

---------

Signed-off-by: Ryan Lin <linjinhong@yandex.com>
2025-04-25 09:12:35 +02:00
nkh0472
c2470ed216 docs: Fix wrong output format in example code (#1427)
fix: wrong output format

Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>
2025-04-22 12:32:55 +02:00
Michele Dolfi
64918a81ac docs: Add OpenSSF Best Practices badge (#1430)
* docs: add openssf badge

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add badge to docs

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-22 11:23:28 +02:00
Ben Cox
995b3b0ab1 docs: Typo fixes in docling_document.md (#1400)
Signed-off-by: Ben Cox <1038350+ind1go@users.noreply.github.com>
2025-04-22 08:49:08 +02:00
Leandro Rosas
88948b0bba docs: Updated the [Usage] link in architecture.md (#1416)
Fixed the [Usage] link in architecture.md

Changed the usage link in the tip box from "../usage.md#adjust-pipeline-features" to "../usage/index.md#adjust-pipeline-features"  as the previous link is not valid.

Signed-off-by: Leandro Rosas <36343022+leandrosas101@users.noreply.github.com>
2025-04-19 10:20:52 +02:00
Felix Dittrich
a7dd59c5cb docs(ocr): Add docs entry for OnnxTR OCR plugin (#1382)
feat(ocr): Add docs entry for OnnxTR OCR plugin

Signed-off-by: felix <felixdittrich92@gmail.com>
2025-04-15 09:46:59 +02:00
Michele Dolfi
5458a88464 ci: add coverage and ruff (#1383)
* add coverage calculation and push

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* new codecov version and usage of token

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* enable ruff formatter instead of black and isort

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply ruff lint fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply ruff unsafe fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add removed imports

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* runs 1 on linter issues

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* finalize linter fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Update pyproject.toml

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-04-14 18:01:26 +02:00
Juil Park
a026b4e84b docs: Add Notes for Installing in Intel macOS (#1377)
docs: Add Notes for Intel macOS

Signed-off-by: Juil Park <park@juil.dev>
2025-04-14 10:21:13 +02:00
Peter W. J. Staar
c0ba88edf1 feat(cli): add option for html with split-page mode (#1355)
* updated the cli to output html in split-page mode

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* add pin for new docling-core with html split argument

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* relock with fixed html export in docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update test results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update more tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update lock with docling-core fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update test results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add again chunking extras

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-14 08:41:50 +02:00