Peter W. J. Staar
3e6da2c62d
docs: Example on PII obfuscation ( #2459 )
...
* added example on PII obfuscation
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* reformatting code
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* add in index and fix heading formatting
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add GLINER to PII
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* final commit
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-14 15:39:16 +02:00
Christoph Auer
cd7f7ba145
fix: Use proper page concatentation in VLM pipeline MD/HTML conversion ( #2458 )
...
* Use proper page concatentation in VLM pipeline MD/HTML conversion
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-10-14 14:12:26 +02:00
github-actions[bot]
3687d865f8
chore: bump version to 2.56.1 [skip ci]
v2.56.1
2025-10-13 16:30:04 +00:00
Michele Dolfi
688a7dfd38
fix: avoid downloading easyocr models by default ( #2454 )
...
avoid downloading easyocr models
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-13 17:58:06 +02:00
github-actions[bot]
10165dda8a
chore: bump version to 2.56.0 [skip ci]
v2.56.0
2025-10-13 09:19:06 +00:00
Animesh
db985bb159
fix(asr): Implement robust status check in AsrPipeline ( #2442 )
...
* test: Add failing test case for silent audio file
* fix: Implement robust status check in AsrPipeline
* DCO Remediation Commit for mastermaxx03 <srivastavaanimesh22@gmail.com >I, mastermaxx03 <srivastavaanimesh22@gmail.com >, hereby add my Signed-off-by to this commit: 5fc4d512b330bb0cd347da4cbcca0fbe9687898aI, mastermaxx03 <srivastavaanimesh22@gmail.com >, hereby add my Signed-off-by to this commit: 31a4e9a5f1
Signed-off-by: mastermaxx03 <srivastavaanimesh22@gmail.com >
* DCO Remediation Commit for mastermaxx03 <srivastavaanimesh22@gmail.com >
I, mastermaxx03 <srivastavaanimesh22@gmail.com >, hereby add my Signed-off-by to this commit: 5fc4d512b3
I, mastermaxx03 <srivastavaanimesh22@gmail.com >, hereby add my Signed-off-by to this commit: 31a4e9a5f1
Signed-off-by: mastermaxx03 <srivastavaanimesh22@gmail.com >
* DCO Remediation Commit for mastermaxx03 <srivastavaanimesh22@gmail.com >
I, mastermaxx03 <srivastavaanimesh22@gmail.com >, hereby add my Signed-off-by to this commit: 5fc4d512b3
I, mastermaxx03 <srivastavaanimesh22@gmail.com >, hereby add my Signed-off-by to this commit: 31a4e9a5f1
Signed-off-by: mastermaxx03 <srivastavaanimesh22@gmail.com >
---------
Signed-off-by: mastermaxx03 <srivastavaanimesh22@gmail.com >
2025-10-13 09:51:31 +02:00
Jeremy Chen
90200443bc
docs: Remove deprecated call in custom_convert.py ( #2447 )
...
Update custom_convert.py
export_to_document_tokens is deprecated so change it to export_to_doctags
Signed-off-by: Jeremy Chen <github@jeremychen.email >
2025-10-13 09:30:02 +02:00
Imad Saddik
2a0f56390a
docs: fixed a few typos ( #2441 )
...
Signed-off-by: Imad Saddik <79410781+ImadSaddik@users.noreply.github.com >
2025-10-13 09:04:50 +02:00
Michele Dolfi
f7244a4333
feat: AutoOCR model selecting the best OCR model available and deprecating the usage of EasyOCR ( #2391 )
...
* add auto ocr model
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Apply suggestions from code review
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
* add final log warning
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* propagate default options
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* allow rapidocr models download
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* remove modelscope
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
2025-10-10 16:11:39 +02:00
Cesar Berrospi Ramis
cce18b2ff7
fix: deal with chartsheets in workbooks ( #2433 )
...
* fix(xlsx): deal with chartsheets in workbooks
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
* tests(xlsx): align test file names
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
2025-10-10 15:06:38 +02:00
Bruno Pio
f11f8c0a81
feat: Add Tesseract PSM options support ( #2411 )
...
* feat: Add Tesseract PSM options support
Signed-off-by: Bruno Pio <913963+blap@users.noreply.github.com >
* apply formatting
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* add tesseract_cli in checks
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Bruno Pio <913963+blap@users.noreply.github.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-10 14:44:30 +02:00
Victor Moreli
ee5501320e
fix: skip temporary docx files ( #2413 )
...
fix: CLI detects docx temporary files and breaks
Signed-off-by: Victor Moreli <victormoreli64@gmail.com >
2025-10-10 09:39:26 +02:00
pixiake
b5f7fef29b
fix: AsrPipeline to handle absolute paths and BytesIO streams correctly ( #2407 )
...
Fix AsrPipeline to handle absolute paths and BytesIO streams correctly
Signed-off-by: pixiake <guofeng@spader-ai.com >
Co-authored-by: pixiake <guofeng@spader-ai.com >
2025-10-10 09:37:15 +02:00
Utsav Talwar
f2854b2e1d
docs: Add MongoDB + VoyageAI ( #2382 )
...
Signed-off-by: Utsav Talwar <114057324+utsavMongoDB@users.noreply.github.com >
Co-authored-by: Utsav Talwar <114057324+utsavMongoDB@users.noreply.github.com >
2025-10-07 14:36:19 -04:00
Michele Dolfi
0610d01afa
fix: enrichment of documents without pages metadata (pptx and xlsx) ( #2401 )
...
fix logic for pptx and xlsx
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-07 18:28:51 +02:00
Maxim Lysak
9705f4020c
fix: Proper heading support in rich tables for HTML backend ( #2394 )
...
* Fix for the proper headers support in rich tables in HTML
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* cleaning up
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Compatibility with older Python versions
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Fixing Furniture before the first heading rule
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Added minimalistic test case
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* added html for the test
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com >
2025-10-07 15:57:32 +02:00
Utsav Talwar
8a4b946a1a
docs: add RAG example with MongoDB Atlas Vector Search and VoyageAI embeddings ( #2341 )
...
* Add MongoDB RAG example
* Update MongoDB RAG Example
* Update MongoDB RAG Example
* Update MongoDB RAG Example
* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com >
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: fbdbf53aa8
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 9b3065ba2b
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 1983f9db35
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 0522aa105d
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: f5a67e8012
Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com >
* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com >
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: fbdbf53aa8
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 9b3065ba2b
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 1983f9db35
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 0522aa105d
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: f5a67e8012
Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com >
* docs: Add example with MongoDB
* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com >
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: bb245a31ed
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 25436e543c
Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com >
* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com >
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: bb245a31ed
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 25436e543c
Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com >
* DCO Remediation Commit for utsavMongoDB <utsav.talwar@mongodb.com >
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: bb245a31ed
I, utsavMongoDB <utsav.talwar@mongodb.com >, hereby add my Signed-off-by to this commit: 25436e543c
Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com >
---------
Signed-off-by: utsavMongoDB <utsav.talwar@mongodb.com >
Signed-off-by: Utsav Talwar <114057324+utsavMongoDB@users.noreply.github.com >
2025-10-03 13:29:43 +02:00
github-actions[bot]
22515b546a
chore: bump version to 2.55.1 [skip ci]
v2.55.1
2025-10-03 10:26:26 +00:00
Rui Dias Gomes
68230fe7e5
ci: split workflow to speedup CI runtime ( #2313 )
...
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* enable test_e2e_pdfs_conversions
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com >
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* split workflow
Signed-off-by: rmdg88 <rmdg88@gmail.com >
* fix conflict files
Signed-off-by: rmdg88 <rmdg88@gmail.com >
---------
Signed-off-by: rmdg88 <rmdg88@gmail.com >
Signed-off-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-03 11:16:38 +02:00
Matvei Smirnov
ee73ffae15
fix(markdown): Setext heading support ( #2359 )
...
Signed-off-by: Matvei Smirnov <vdalekesmirnov@gmail.com >
Co-authored-by: Matvei Smirnov <matvei.smirnov@vkteam.ru >
2025-10-03 10:32:53 +02:00
Hakeem Abbas
246de77d8c
fix(docs): fixed the color scheme ( #2371 )
...
* fix(docs): fixed the color scheme
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
* fix(docs): colors background
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
---------
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
2025-10-03 10:20:44 +02:00
Michele Dolfi
a975a790c9
docs: example using Hashicorp Vault PII transform ( #2373 )
...
docs: add example using Hashicorp Vault PII transform
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-03 09:53:29 +02:00
Michele Dolfi
9505202e38
ci: update docling-parse and remove pages.json ( #2372 )
...
* update docling-parse and remove pages.json
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* ocr gt
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-03 09:53:13 +02:00
Christoph Auer
ca2be7ff3a
fix: Empty table handling ( #2365 )
...
* add table raw cells when no table structure model was used
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Add RichTableCell instance for tables with missing structure.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* update test results
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-02 19:35:16 +02:00
Lucas Morin
e6c3b05e63
docs: Jobkit and connectors ( #2357 )
...
* feat: create documentation for docling-jobkit
Signed-off-by: Lucas Morin <lucas.morin222@gmail.com >
* small text fixes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Lucas Morin <lucas.morin222@gmail.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-10-02 13:46:56 +02:00
Michele Dolfi
4f295ed051
fix: add table raw content when no table structure model is used ( #1815 )
...
* add table raw cells when no table structure model was used
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Add RichTableCell instance for tables with missing structure.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update test GT
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-10-02 13:46:42 +02:00
github-actions[bot]
f0b630e24e
chore: bump version to 2.55.0 [skip ci]
v2.55.0
2025-09-30 14:50:42 +00:00
Christoph Auer
1e9dc43b72
feat: Repetition-based StoppingCriteria for GraniteDocling ( #2323 )
...
* Experimental code for repetition detection, VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update VLLM inference code, CLI and VLM specs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix generation and decoder args for HF model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix vllm device args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Bugfixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove streaming VLLM for the moment
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add repetition StoppingCriteria for GraniteDocling/SmolDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Make GenerationStopper base class and port for MLX
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add streaming support and custom GenerationStopper support for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix api_image_request_streaming when GenerationStopper triggers.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Move DocTagsRepetitionStopper to utility unit, update examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-09-30 15:26:09 +02:00
Michele Dolfi
68ae7ccf3c
fix: pin wider range of typer ( #2309 )
...
* pin larger range of typer
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update docling-core
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update deps
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update tests
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* lock docling-parse 4.5.0
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update results with docling-parse=4.4.0
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-09-30 08:42:23 +02:00
Christoph Auer
654c70f990
fix: Update Transformers & VLLM inference code, CLI and VLM specs ( #2322 )
...
* Update VLLM inference code, CLI and VLM specs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix generation and decoder args for HF model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix vllm device args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Bugfixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-09-29 21:06:54 +02:00
Maxim Lysak
c803abed9a
feat: Rich tables support for HTML backend ( #2324 )
...
* Rich tables support for HTML backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Decoupling JATS backend from HTML backend, ways of creating tables changed significantly
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* updated and added tests
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Refactored parse_table_data in html_backend into few smaller functions
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Changing scope of few functions in html_backend.py, making them static, when possible
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Fix for HTML tables that have tbody and/or thead, now these tables are also properly supported
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com >
2025-09-29 18:12:16 +02:00
Hakeem Abbas
325877aee9
docs(styling): update color scheme ( #2154 )
...
* update the colors scheme
* update mkdocs.yml
* DCO Remediation Commit for Hakeem Abbas <hakeemsyd@gmail.com >
I, Hakeem Abbas <hakeemsyd@gmail.com >, hereby add my Signed-off-by to this commit: 861cb8ce6e
I, Hakeem Abbas <hakeemsyd@gmail.com >, hereby add my Signed-off-by to this commit: 72539fe5c0
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
* update image
* DCO Remediation Commit for Hakeem Abbas <hakeemsyd@gmail.com >
I, Hakeem Abbas <hakeemsyd@gmail.com >, hereby add my Signed-off-by to this commit: 861cb8ce6e
I, Hakeem Abbas <hakeemsyd@gmail.com >, hereby add my Signed-off-by to this commit: 72539fe5c0
I, Hakeem Abbas <hakeemsyd@gmail.com >, hereby add my Signed-off-by to this commit: 1be2646643
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
* undo image change
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
---------
Signed-off-by: Hakeem Abbas <hakeemsyd@gmail.com >
2025-09-29 11:44:40 +02:00
Luis
a873200c9d
docs(vlm): Update SmolDocling to GraniteDocling references ( #2315 )
...
Update minimal_vlm_pipeline.py
Signed-off-by: Luis <luis.rojas@ibm.com >
2025-09-25 11:07:39 +02:00
Lucas Morin
9d67bb9ed6
fix: support escaped characters in markdown backend ( #2304 )
...
fix: improve markdown backend to support input documents with escaped characters
Signed-off-by: Lucas Morin <lucas.morin222@gmail.com >
2025-09-23 18:00:16 +02:00
github-actions[bot]
d599177547
chore: bump version to 2.54.0 [skip ci]
v2.54.0
2025-09-22 15:28:30 +00:00
Maxim Lysak
e2482a2ada
feat: Rich tables for MSWord backend ( #2291 )
...
* Adding support of rich table cells to MSWord backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Fixes for properly accounting lists, pictures and headers in rich table cells
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Cleaned up msword backend, re-generated docx tests
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Added detection of simple table cells in word backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
* Cleaned up
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com >
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com >
2025-09-22 16:41:59 +02:00
Cesar Berrospi Ramis
46efaaefee
feat: add a backend parser for WebVTT files ( #2288 )
...
* feat: add a backend parser for WebVTT files
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
* docs: update README with VTT support
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
* docs: add description to supported formats
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
* chore: upgrade docling-core to unescape WebVTT in markdown
Pin the new release of docling-core 2.48.2.
Do not escape HTML reserved characters when exporting WebVTT documents to markdown.
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
* test: add missing copyright notice
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
---------
Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com >
2025-09-22 15:24:34 +02:00
manuflexor
b5628f1227
fix: correct y-axis scaling in draw_table_cells ( #2287 )
...
* Fix y axis
* DCO Remediation Commit for manuflexor <imanuel@flexor.ai >
I, manuflexor <imanuel@flexor.ai >, hereby add my Signed-off-by to this commit: cd56622d4f
Signed-off-by: manuflexor <imanuel@flexor.ai >
---------
Signed-off-by: manuflexor <imanuel@flexor.ai >
2025-09-19 13:42:29 +02:00
Christoph Auer
8b7e83a8c7
docs: Update API VLM example with granite-docling ( #2294 )
...
chore: Update API VLM example with granite-docling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-09-19 12:23:53 +02:00
Panos Vagenas
8322c2ea9b
docs: fix examples rendering ( #2281 )
...
fix examples rendering
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-09-17 20:50:50 -04:00
github-actions[bot]
f1687fb09b
chore: bump version to 2.53.0 [skip ci]
v2.53.0
2025-09-17 13:59:33 +00:00
Christoph Auer
17afb664d0
feat: Add granite-docling model ( #2272 )
...
* adding granite-docling preview
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated the model specs
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* typo
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* use granite-docling and add to the model downloader
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update docs and README
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Update final repo_ids for GraniteDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update final repo_ids for GraniteDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Fix model name in CLI usage example
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Fix VLM model name in README.md
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Co-authored-by: Peter Staar <taa@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-09-17 15:15:49 +02:00
Mingxuan Zhao
ff351fd40c
docs: Describe examples ( #2262 )
...
* Update .py examples with clearer guidance,
update out of date imports and calls
Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com >
* Fix minimal.py string error, fix ruff format error
Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com >
* fix more CI issues
Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com >
---------
Signed-off-by: Mingxuan Zhao <43148277+mingxzhao@users.noreply.github.com >
2025-09-16 16:00:38 +02:00
dmorady1
0e95171dd6
feat(RapidOcr): Support generic extra arguments for RapidOcr ( #2266 )
...
* feat: add support for additional parameters in RapidOcrOptions and fix RapidOcr font_path
* DCO Remediation Commit for David Morady <29502285+dmorady1@users.noreply.github.com >
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 133d989060
Signed-off-by: David Morady <29502285+dmorady1@users.noreply.github.com >
* fix: RapidOcr ensure backwards compatibility and add deprecation note
* add warning log for rec_font_path
* DCO Remediation Commit for David Morady <29502285+dmorady1@users.noreply.github.com >
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 133d989060
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 0a65eed28a
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: ac96f1483f
Signed-off-by: David Morady <29502285+dmorady1@users.noreply.github.com >
* add tests for code coverage for rapidocr
* DCO Remediation Commit for David Morady <29502285+dmorady1@users.noreply.github.com >
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 133d989060
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 0a65eed28a
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: ac96f1483f
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: af5df4bb30
Signed-off-by: David Morady <29502285+dmorady1@users.noreply.github.com >
* add small comment for test
* DCO Remediation Commit for David Morady <29502285+dmorady1@users.noreply.github.com >
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 133d989060
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 0a65eed28a
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: ac96f1483f
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: af5df4bb30
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: ab893b637f
Signed-off-by: David Morady <29502285+dmorady1@users.noreply.github.com >
* fix test comment
* DCO Remediation Commit for David Morady <29502285+dmorady1@users.noreply.github.com >
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 133d989060
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 0a65eed28a
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: ac96f1483f
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: af5df4bb30
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: ab893b637f
I, David Morady <29502285+dmorady1@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 028e332aa9
Signed-off-by: David Morady <29502285+dmorady1@users.noreply.github.com >
---------
Signed-off-by: David Morady <29502285+dmorady1@users.noreply.github.com >
2025-09-16 07:26:10 +02:00
Michele Dolfi
ad2f738231
chore: update lock ( #2265 )
...
* update lock
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* update changes from docling-core update
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-09-15 11:19:15 +02:00
Yuie.
609d902eef
fix: handle empty result from RapidOCR to avoid crash ( #2264 )
...
Signed-off-by: Junehyuk Park <yuie@evonit.net >
2025-09-15 10:04:33 +02:00
github-actions[bot]
10bb0aee2d
chore: bump version to 2.52.0 [skip ci]
v2.52.0
2025-09-11 16:11:20 +00:00
Christoph Auer
0700af212c
fix: Add missing features in ThreadedStandardPdfPipeline ( #2252 )
...
Add missing features in ThreadedStandardPdfPipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-09-11 16:26:02 +02:00
Michele Dolfi
2c9123419f
feat: enrichment steps on all convert pipelines (incl docx, html, etc) ( #2251 )
...
* allow enrichment on all convert pipelines
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* set options in CLI
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-09-11 15:09:00 +02:00
Michele Dolfi
c6965495a2
fix: address deprecation warnings of dependencies ( #2237 )
...
* switch to dtype instead of torch_dtype
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* set __check_model__ to avoid deprecation warnings
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* remove dataloaders warnings in easyocr
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* suppress with option
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-09-10 14:38:34 +02:00