docling

mirror of https://github.com/DS4SD/docling.git synced 2025-12-08 20:58:11 +00:00

Author	SHA1	Message	Date
Michele Dolfi	2c9123419f	feat: enrichment steps on all convert pipelines (incl docx, html, etc) (#2251 ) * allow enrichment on all convert pipelines Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * set options in CLI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-09-11 15:09:00 +02:00
Cesar Berrospi Ramis	f8cc545bab	docs: add an example of RAG with OpenSearch (#2238 ) * docs: add an example of RAG with OpeanSearch Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * chore: pin latest docling-core and update uv.lock Pin latest version release of docling-core in pyproject.toml Update the dependencies in uv.lock file Run the notebook rag_opensearch.ipynb to pick up changes from docling-core Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>	2025-09-10 14:37:22 +02:00
Roy Derks	e5cd7020bd	docs: Add instructions for using Docling with MCP to README (#2219 ) * docs: Add instructions for using Docling with MCP to README * DCO Remediation Commit for Roy Derks <10717410+royderks@users.noreply.github.com> Signed-off-by: Roy Derks <roy.derks@ibm.com> * DCO Remediation Commit for Roy Derks <10717410+royderks@users.noreply.github.com> I, Roy Derks <10717410+royderks@users.noreply.github.com>, hereby add my Signed-off-by to this commit: `4b9ba1d0ef` Signed-off-by: Roy Derks <roy.derks@ibm.com> * docs: reorganize documentation on MCP server Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * docs: align README with documentation index page Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Roy Derks <roy.derks@ibm.com> Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> Co-authored-by: Roy Derks <roy.derks@ibm.com> Co-authored-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>	2025-09-10 10:02:28 +02:00
Tamás Bitai	55f5f3752f	docs: Document VLM support requirement in extraction example (#2231 ) * docs: Document VLM support requirement in extraction example * DCO Remediation Commit for Tamás Bitai <bitai.tamas@gmail.com> I, Tamás Bitai <bitai.tamas@gmail.com>, hereby add my Signed-off-by to this commit: `b90defdb77` Signed-off-by: Tamás Bitai <bitai.tamas@gmail.com> --------- Signed-off-by: Tamás Bitai <bitai.tamas@gmail.com>	2025-09-09 13:45:55 +02:00
Panos Vagenas	a9f41b088e	docs: add information extraction example (#2199 ) * docs: add information exctraction example Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update README Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * minor typo Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update README Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-09-05 11:27:09 +02:00
Shikhar Bhardwaj	9f0286bcac	fix: translation example (#2166 ) * fix: translation example Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com> * Fix translation example formatting Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com> --------- Signed-off-by: shikharbhardwaj <8502456+shikharbhardwaj@users.noreply.github.com>	2025-09-01 11:04:46 +02:00
Panos Vagenas	96cab6b536	docs: enrich landing pages (#2165 ) Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-08-29 17:19:05 +02:00
geoHeil	3f60a0fa78	feat: Upgrade to RapidOCR 3.x (#2088 ) * feat: exploring new version * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * chore: autoformat DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775 * feat: enable configurable runtime for rapidocr and handle new result better; Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * chore: fix linter Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * chore: use new server model * chore: change default engine type to onnx * chore: tests update for new rapidocr * fix: rebase from main and fix clashes * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 5815c8f81b0e5ce400332597b6795e5a97ecf775 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 02f9db85f562e5cdfda40c52fee55cfd4030d70a I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a7bcb205faedb881f94a89b3bbd29cb31ccd54f0 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a39482a98cbcff7a825c8321134732af0c65930a I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 63e9d717fa26951566b02761f3fdfc752c31f805 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: ef12a6ec1ea2846a8a8e2e776eeaa59c2a0c4dfe Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 2222d2340387f8d9d66f3ca9d8e21a0945a44e7a I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: bc6a1dc507d7f146ec4797a2d3840414f46ac64d I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 56e0d67da7c57d4b5caf8eaef8dff7056c3efd32 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 871ca21271412006c76acf3c19426140efed3d50 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 7b1b77159da729d483a581a86c7309acba1712a7 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: a792a714a43e19a91b2b782f54621c1c5efda632 Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: d1fed26323ff829b716bc667fe69532839363e45 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 346ec1cad943765f886e5d17fb0a54221124689c I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 4d0bbe5bd6e9f7261b97362ff8823af244267089 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 34a5ad53892a7064a6bf35f890d344d464c78b2f I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 9151959db3ad53535011d1cfdcf9181fdf936bb1 I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: 8ef5536f2c098826c6c0a05190f8a80614c3f3cb Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * DCO Remediation Commit for Georg Heiler <georg.kf.heiler@gmail.com> I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: `7e18637a35` I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: `63fb8ff599` I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: `0cb9444fb8` I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: `38940d9978` I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: `b6d461ac42` I, Georg Heiler <georg.kf.heiler@gmail.com>, hereby add my Signed-off-by to this commit: `ee55eb3408` Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com>	2025-08-25 12:10:33 +02:00
VIktor Kuropiantnyk	cdf079dd06	feat(CLI): Option to download arbitrary HuggingFace model (#2123 ) * Added option to docling-tools to download arbitrary HuggingFace model Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com> * Added note in documentation Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com> * Removed note on custom artifact path usage from HF download option Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com> * Fixed typo Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com> --------- Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-08-22 15:23:29 +02:00
Maroun Touma	e76298c40d	docs: DPK pipeline example using docling library (#2112 ) * Notebook showing example on how to use docling transforms in DPK Signed-off-by: Maroun Touma <touma@us.ibm.com> * fix HF Token name Signed-off-by: Maroun Touma <touma@us.ibm.com> * use %pip instead of pip install jupyter lab Signed-off-by: Maroun Touma <touma@us.ibm.com> * run formatter Signed-off-by: Maroun Touma <touma@us.ibm.com> * add example to mkdocs and fix typo Signed-off-by: Maroun Touma <touma@us.ibm.com> --------- Signed-off-by: Maroun Touma <touma@us.ibm.com>	2025-08-21 10:14:36 +02:00
Panos Vagenas	8996d612aa	docs: add Getting Started page (#2113 ) * docs: add Getting Started page Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * refactor usage Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * minor renaming Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-08-21 08:44:53 +02:00
Eric Deandrea	76c1fbd6e8	docs: Add docling Quarkus integration (#2083 ) * Add docling Quarkus integration * DCO Remediation Commit for Eric Deandrea <eric.deandrea@ibm.com> I, Eric Deandrea <eric.deandrea@ibm.com>, hereby add my Signed-off-by to this commit: `86aa0b80f4` Signed-off-by: Eric Deandrea <eric.deandrea@ibm.com> --------- Signed-off-by: Eric Deandrea <eric.deandrea@ibm.com>	2025-08-18 06:55:51 +02:00
Shkarupa Alex	5f050f94e1	feat(vlm): Ability to preprocess VLM response (#1907 ) * Add ability to preprocess VLM response Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * Move response decoding to vlm options (requires inheritance to override). Per-page prompt formulation also moved to vlm options to keep api consistent. Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> --------- Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>	2025-08-12 15:20:24 +02:00
Panos Vagenas	e2cca931be	docs: add Langflow integration (#2068 ) * docs: add langflow integration Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * fix link Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-08-11 16:03:29 +02:00
Thomas Vitale	bfda6d34d8	docs: Add Arconia integration (#2061 ) Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>	2025-08-08 09:35:47 +02:00
Michele Dolfi	7b5f86098d	docs: add chat with dosu (#1984 ) add chat with dosu Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-24 11:07:36 +02:00
Michele Dolfi	90a7cc4bdd	docs: enrich existing DoclingDocument (#1969 ) add example for enriching an existing doclingdocument Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-22 16:20:15 +02:00
Fabiano Franz	5d98bcea1b	docs: add documentation for confidence scores (#1912 ) * docs: add documentation for confidence scores Signed-off-by: Fabiano Franz <contact@fabianofranz.com> * Increase focus on confidence grades, scores are informational only Signed-off-by: Fabiano Franz <contact@fabianofranz.com> * Update confidence_scores.md Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> --------- Signed-off-by: Fabiano Franz <contact@fabianofranz.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>	2025-07-21 10:16:17 +02:00
stephencox-ict	d6d2dbe2f9	docs: Fix typos (#1943 ) Fix typos Signed-off-by: stephencox-ict <scox@ict.co>	2025-07-15 09:51:56 +02:00
Copilot	c5fb353f10	fix: Change granite vision model URL from preview to stable version (#1925 ) * Initial plan * Fix granite vision model URL from preview to stable version Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com> * Update to granite vision 3.3 Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Update to granite vision 3.3 (2) Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> --------- Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com>	2025-07-11 08:46:03 +02:00
geoHeil	a07ba863c4	feat: add image-text-to-text models in transformers (#1772 ) * feat(dolphin): add dolphin support Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * rename Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * reformat Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * fix mypy Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * add prompt style and examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-08 05:54:57 +02:00
VIktor Kuropiantnyk	e25873d557	fix: docs are missing osd packages for tesseract on RHEL (#1905 ) Fixed missing packages in the docs on tesseract Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-07-07 17:06:26 +02:00
Shkarupa Alex	b8813eea80	feat(vlm): Dynamic prompts (#1808 ) * Unify temperature options for Vlm models * Dynamic prompt support with example * DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `34d446cb98` I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `9c595d574f` Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * Replace Page with SegmentedPage * Fix example HF repo link Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Sign-off Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `1a162066dd` Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * Use lmstudio-community model Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Swap inference engine to LM Studio Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> --------- Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>	2025-07-07 16:58:42 +02:00
Peter W. J. Staar	f3ae3029b8	docs: update readme and add ASR example (#1836 ) * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added minimal_asr_pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Updated README and added ASR example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Updated docs.index.md Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated CI and mkdocs Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added link tp existing audio file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added link tp existing audio file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-06-23 18:55:16 +02:00
Michele Dolfi	64ac043786	docs: support running examples from root or subfolder (#1816 ) support running examples from root or subfolder Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-19 11:10:40 +02:00
Michele Dolfi	0432a31b2f	docs: update vlm models api examples with LM Studio (#1759 ) update vlm models api examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-12 12:58:44 +02:00
Michele Dolfi	49b10e7419	docs: add open webui (#1734 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-10 09:35:20 +02:00
Michele Dolfi	be42b03f9b	docs: flash-attn usage and install (#1706 ) * docs: flash-attn usage and install Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * fix link Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-04 11:09:54 +02:00
Michele Dolfi	cdd401847a	feat: simplify dependencies, switch to uv (#1700 ) * refactor with uv Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * constraints for onnxruntime Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * more constraints Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-03 15:18:54 +02:00
Peter W. J. Staar	cfdf4cea25	feat: new vlm-models support (#1570 ) * feat: adding new vlm-models support Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the transformers Signed-off-by: Peter Staar <taa@zurich.ibm.com> * got microsoft/Phi-4-multimodal-instruct to work Signed-off-by: Peter Staar <taa@zurich.ibm.com> * working on vlm's Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring the VLM part Signed-off-by: Peter Staar <taa@zurich.ibm.com> * all working, now serious refacgtoring necessary Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring the download_model Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the formulate_prompt Signed-off-by: Peter Staar <taa@zurich.ibm.com> * pixtral 12b runs via MLX and native transformers Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the VlmPredictionToken Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactoring minimal_vlm_pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the MyPy Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added pipeline_model_specializations file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * need to get Phi4 working again ... Signed-off-by: Peter Staar <taa@zurich.ibm.com> * finalising last points for vlms support Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the pipeline for Phi4 Signed-off-by: Peter Staar <taa@zurich.ibm.com> * streamlining all code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixing the tests Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the html backend to the VLM pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the static load_from_doctags Signed-off-by: Peter Staar <taa@zurich.ibm.com> * restore stable imports Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use AutoModelForVision2Seq for Pixtral and review example (including rename) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove unused value Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * refactor instances of VLM models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * skip compare example in CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use lowercase and uppercase only Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add new minimal_vlm example and refactor pipeline_options_vlm_model for cleaner import Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename pipeline_vlm_model_spec Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * move more argument to options and simplify model init Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add supported_devices Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove not-needed function Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * exclude minimal_vlm Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * missing file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add message for transformers version Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename to specs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use module import and remove MLX from non-darwin Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove hf_vlm_model and add extra_generation_args Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * use single HF VLM model class Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * remove torch type Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add docs for vision models Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-02 17:01:06 +02:00
Edgar Hipp	11ca4f7a7b	docs: fix typo in index.md (#1676 ) Signed-off-by: Edgar Hipp <hipp.edg@gmail.com>	2025-06-02 12:35:59 +02:00
Panos Vagenas	7c4c356e76	chore: fix chunking example data link (#1596 ) Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-05-16 08:44:47 +02:00
Panos Vagenas	9f28abf061	docs: add advanced chunking & serialization example (#1589 ) Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-05-14 14:35:07 +02:00
Elwin	12dab0a1e8	feat: support image/webp file type (#1415 ) * support image/webp file type Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com> Signed-off-by: Elwin <hzywong@gmail.com> * docs: add webp image format in supported_formats.md Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com> Signed-off-by: Elwin <hzywong@gmail.com> * test: add a test case for `image/webp` file Signed-off-by: Elwin <hzywong@gmail.com> * style: apply styling Signed-off-by: Elwin <hzywong@gmail.com> * test: update test case of converting `image/webp` file with more ocr engines Signed-off-by: Elwin <hzywong@gmail.com> * style: apply styling Signed-off-by: Elwin <hzywong@gmail.com> * rename test file Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Elwin <61868295+hzhaoy@users.noreply.github.com> Signed-off-by: Elwin <hzywong@gmail.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-14 09:47:28 +02:00
Oleg Lavrovsky	844babb390	docs: update links in data_prep_kit (#1559 ) Update data_prep_kit.md The links were broken, since the repository was renamed. I also noticed that PDF2Parquet is now referred to as Docling2Parquet. Signed-off-by: Oleg Lavrovsky <31819+loleg@users.noreply.github.com>	2025-05-11 20:38:25 +02:00
Panos Vagenas	3220a592e7	docs: add serialization docs, update chunking docs (#1556 ) * docs: add serializers docs, update chunking docs Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update notebook to improve MD table rendering Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-05-08 21:43:01 +02:00
nkh0472	a097ccd8d5	chore: typo fix (#1465 ) * typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> * chore: typo fix Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com> --------- Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>	2025-04-28 08:52:09 +02:00
Emmanuel Ferdman	3afbe6c969	docs: update supported formats guide (#1463 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-04-28 08:51:54 +02:00
Ryan Lin	a2fbbba9f7	feat: add tutorial using Milvus and Docling for RAG pipeline (#1449 ) * feat: add milvus rag with docling tutorial Signed-off-by: Ryan Lin <linjinhong@yandex.com> * chore: run pre-commit Signed-off-by: Ryan Lin <linjinhong@yandex.com> * feat: add RAG with Milvus example to mkdocs Signed-off-by: Ryan Lin <linjinhong@yandex.com> --------- Signed-off-by: Ryan Lin <linjinhong@yandex.com>	2025-04-25 09:12:35 +02:00
nkh0472	c2470ed216	docs: Fix wrong output format in example code (#1427 ) fix: wrong output format Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>	2025-04-22 12:32:55 +02:00
Michele Dolfi	64918a81ac	docs: Add OpenSSF Best Practices badge (#1430 ) * docs: add openssf badge Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add badge to docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-22 11:23:28 +02:00
Ben Cox	995b3b0ab1	docs: Typo fixes in docling_document.md (#1400 ) Signed-off-by: Ben Cox <1038350+ind1go@users.noreply.github.com>	2025-04-22 08:49:08 +02:00
Leandro Rosas	88948b0bba	docs: Updated the [Usage] link in architecture.md (#1416 ) Fixed the [Usage] link in architecture.md Changed the usage link in the tip box from "../usage.md#adjust-pipeline-features" to "../usage/index.md#adjust-pipeline-features" as the previous link is not valid. Signed-off-by: Leandro Rosas <36343022+leandrosas101@users.noreply.github.com>	2025-04-19 10:20:52 +02:00
Felix Dittrich	a7dd59c5cb	docs(ocr): Add docs entry for OnnxTR OCR plugin (#1382 ) feat(ocr): Add docs entry for OnnxTR OCR plugin Signed-off-by: felix <felixdittrich92@gmail.com>	2025-04-15 09:46:59 +02:00
Michele Dolfi	5458a88464	ci: add coverage and ruff (#1383 ) * add coverage calculation and push Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * new codecov version and usage of token Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * enable ruff formatter instead of black and isort Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply ruff lint fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * apply ruff unsafe fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add removed imports Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * runs 1 on linter issues Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * finalize linter fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update pyproject.toml Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> --------- Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-04-14 18:01:26 +02:00
Juil Park	a026b4e84b	docs: Add Notes for Installing in Intel macOS (#1377 ) docs: Add Notes for Intel macOS Signed-off-by: Juil Park <park@juil.dev>	2025-04-14 10:21:13 +02:00
Peter W. J. Staar	c0ba88edf1	feat(cli): add option for html with split-page mode (#1355 ) * updated the cli to output html in split-page mode Signed-off-by: Peter Staar <taa@zurich.ibm.com> * add pin for new docling-core with html split argument Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * relock with fixed html export in docling-core Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update test results Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update more tests Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update lock with docling-core fixes Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * update test results Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add again chunking extras Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-14 08:41:50 +02:00
Gabe Goodhart	c605edd8e9	feat: OllamaVlmModel for Granite Vision 3.2 (#1337 ) * build: Add ollama sdk dependency Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add option plumbing for OllamaVlmOptions in pipeline_options Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Full implementation of OllamaVlmModel Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Connect "granite_vision_ollama" pipeline option to CLI Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * Revert "build: Add ollama sdk dependency" After consideration, we're going to use the generic OpenAI API instead of the Ollama-specific API to avoid duplicate work. This reverts commit bc6b366468cdd66b52540aac9c7d8b584ab48ad0. Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Move OpenAI API call logic into utils.utils This will allow reuse of this logic in a generic VLM model NOTE: There is a subtle change here in the ordering of the text prompt and the image in the call to the OpenAI API. When run against Ollama, this ordering makes a big difference. If the prompt comes before the image, the result is terse and not usable whereas the prompt coming after the image works as expected and matches the non-OpenAI chat API. Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Refactor from Ollama SDK to generic OpenAI API Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Linting, formatting, and bug fixes The one bug fix was in the timeout arg to openai_image_request. Otherwise, this is all style changes to get MyPy and black passing cleanly. Branch: OllamaVlmModel Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * remove model from download enum Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * generalize input args for other API providers Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * rename and refactor Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add example Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * require flag for remote services Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * disable example from CI Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * add examples to docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-10 18:03:04 +02:00
Michele Dolfi	2e99e5a54f	docs: add plugins docs (#1319 ) add plugin docs Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-08 09:44:37 +02:00
Panos Vagenas	71148eb381	docs: add visual grounding example (#1270 ) Some checks failed Run Docs CD / build-deploy-docs (push) Failing after 1m28s Run Docs CI / build-docs (push) Failing after 54s Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-04-02 14:03:19 +02:00

1 2 3

148 Commits