Copilot
8d50a59d48
fix: multi-page image support (tiff) ( #1928 )
...
* Initial plan
* Fix multi-page TIFF image support
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
* add RGB conversion
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Remove pointless test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add multi-page TIFF test data and verification tests
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
* Revert "Add multi-page TIFF test data and verification tests"
This reverts commit 130a10e2d9 .
* Proper test for 2 page tiff file
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* DCO Remediation Commit for copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
I, copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 420df478f3
I, copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >, hereby add my Signed-off-by to this commit: c1d722725f
I, Christoph Auer <cau@zurich.ibm.com >, hereby add my Signed-off-by to this commit: 6aa85cc933
I, copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >, hereby add my Signed-off-by to this commit: 130a10e2d9
I, Christoph Auer <cau@zurich.ibm.com >, hereby add my Signed-off-by to this commit: d571f36299
I, Christoph Auer <cau@zurich.ibm.com >, hereby add my Signed-off-by to this commit: 2aab66288b
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Proper test for 2 page tiff file (2)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-23 09:55:40 +02:00
github-actions[bot]
ec971bbe68
chore: bump version to 2.42.1 [skip ci]
v2.42.1
2025-07-22 16:45:48 +00:00
Christoph Auer
67441ca418
fix: Keep formula clusters also when empty ( #1970 )
...
Keep formula clusters also when empty
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-22 17:02:12 +02:00
Michele Dolfi
90a7cc4bdd
docs: enrich existing DoclingDocument ( #1969 )
...
add example for enriching an existing doclingdocument
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-07-22 16:20:15 +02:00
Cesar Berrospi Ramis
a069b1175b
refactor(HTML): handle text from styled html ( #1960 )
...
* A new HTML backend that handles styled html (ignors it) as well as images.
Images are parsed as placeholders with a caption, if it exists.
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Co-authored-by: vaaale <2428222+vaaale@users.noreply.github.com >
Signed-off-by: Alexander Vaagan <alexander.vaagan@gmail.com >
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Signed-off-by: vaaale <2428222+vaaale@users.noreply.github.com >
* tests(HTML): re-enable test_ordered_lists
Re-enable test_ordered_lists regression test for the HTML backend since
docling-core now supports ordered lists with custom start value.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
---------
Signed-off-by: Alexander Vaagan <alexander.vaagan@gmail.com >
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
Signed-off-by: vaaale <2428222+vaaale@users.noreply.github.com >
Co-authored-by: Alexander Vaagan <2428222+vaaale@users.noreply.github.com >
2025-07-22 13:16:31 +02:00
Fabiano Franz
5d98bcea1b
docs: add documentation for confidence scores ( #1912 )
...
* docs: add documentation for confidence scores
Signed-off-by: Fabiano Franz <contact@fabianofranz.com >
* Increase focus on confidence grades, scores are informational only
Signed-off-by: Fabiano Franz <contact@fabianofranz.com >
* Update confidence_scores.md
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
---------
Signed-off-by: Fabiano Franz <contact@fabianofranz.com >
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
2025-07-21 10:16:17 +02:00
github-actions[bot]
7561be537a
chore: bump version to 2.42.0 [skip ci]
v2.42.0
2025-07-18 15:34:59 +00:00
Christoph Auer
cca05c45ea
fix: Safe pipeline init, use device_map in transformers models ( #1917 )
...
* Use device_map for transformer models
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add accelerate
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Relax accelerate min version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Make pipeline cache+init thread-safe
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-18 15:14:36 +02:00
Cesar Berrospi Ramis
e1e3053695
fix: fix HTML table parser and JATS backend bugs ( #1948 )
...
Fix a bug in parsing HTML tables in HTML backend.
Fix a bug in test file that prevented JATS backend tests.
Ensure that the JATS backend creates headings with the right level.
Remove unnecessary data files for testing JATS backend.
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
2025-07-16 10:49:24 +02:00
stephencox-ict
d6d2dbe2f9
docs: Fix typos ( #1943 )
...
Fix typos
Signed-off-by: stephencox-ict <scox@ict.co >
2025-07-15 09:51:56 +02:00
Christoph Auer
a436be7367
feat: Add option to control empty clusters in layout postprocessing ( #1940 )
...
Add option to control empty clusters in layout postprocessing
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-14 18:32:01 +02:00
Copilot
95e70962f1
fix: KeyError: 'fPr' when processing latex fractions in DOCX files ( #1926 )
...
* Initial plan
* Initial analysis and fix for KeyError: 'fPr' in OMML fraction processing
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
* Add comprehensive test for OMML fraction fPr fix
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
* Use debug logging, remove unnecesary test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-11 09:52:14 +02:00
Copilot
c5fb353f10
fix: Change granite vision model URL from preview to stable version ( #1925 )
...
* Initial plan
* Fix granite vision model URL from preview to stable version
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
* Update to granite vision 3.3
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Update to granite vision 3.3 (2)
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
---------
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: cau-git <60343111+cau-git@users.noreply.github.com >
2025-07-11 08:46:03 +02:00
github-actions[bot]
6c4bf9d087
chore: bump version to 2.41.0 [skip ci]
v2.41.0
2025-07-10 14:25:05 +00:00
Christoph Auer
cc6193b3b9
test: Update tests to use default PDF backend (DPv4) ( #1923 )
...
* Update tests to use default PDF backend (DPv4)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* OCR tests use DPv1 until rotation bugs are fixed
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-10 15:16:56 +02:00
Christoph Auer
2b8616d6d5
feat: Layout model specification and multiple choices ( #1910 )
...
* Establish layout_model spec and example instantations
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Updated naming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Back to uppercase constants
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* fix deps issue with openai-whipser>numba>llvmlite
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Pull v1 changed test GT from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-10 06:37:27 +02:00
Panos Vagenas
ec588df971
feat: enable precision control in float serialization ( #1914 )
...
* chore: propagate precision control in float serialization
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* parametrize float serialization, propagate core updates
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update test float precision
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* repin docling-core
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-07-09 16:39:17 +02:00
Clément Doumouro
931eb55b88
fix(ocr-utils): unit test and fix the rotate_bounding_box function ( #1897 )
...
Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com >
2025-07-08 18:03:29 +02:00
geoHeil
a07ba863c4
feat: add image-text-to-text models in transformers ( #1772 )
...
* feat(dolphin): add dolphin support
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com >
* rename
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com >
* reformat
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com >
* fix mypy
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com >
* add prompt style and examples
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
---------
Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
2025-07-08 05:54:57 +02:00
VIktor Kuropiantnyk
e25873d557
fix: docs are missing osd packages for tesseract on RHEL ( #1905 )
...
Fixed missing packages in the docs on tesseract
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com >
2025-07-07 17:06:26 +02:00
Shkarupa Alex
b8813eea80
feat(vlm): Dynamic prompts ( #1808 )
...
* Unify temperature options for Vlm models
* Dynamic prompt support with example
* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com >
I, Shkarupa Alex <shkarupa.alex@gmail.com >, hereby add my Signed-off-by to this commit: 34d446cb98
I, Shkarupa Alex <shkarupa.alex@gmail.com >, hereby add my Signed-off-by to this commit: 9c595d574f
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
* Replace Page with SegmentedPage
* Fix example HF repo link
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Sign-off
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com >
I, Shkarupa Alex <shkarupa.alex@gmail.com >, hereby add my Signed-off-by to this commit: 1a162066dd
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
* Use lmstudio-community model
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
* Swap inference engine to LM Studio
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
---------
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com >
2025-07-07 16:58:42 +02:00
Michele Dolfi
edd4356aac
fix: use only backend for picture classifier ( #1904 )
...
use backend for picture classifier
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-07-07 16:23:16 +02:00
Michele Dolfi
dd8fde7f19
fix: typo in asr options ( #1902 )
...
fix typo
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-07-07 08:59:14 +02:00
github-actions[bot]
f4a1c06937
chore: bump version to 2.40.0 [skip ci]
v2.40.0
2025-07-04 15:31:36 +00:00
Christoph Auer
ec6cf6f7e8
feat: Introduce LayoutOptions to control layout postprocessing behaviour ( #1870 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-04 15:36:13 +02:00
Christoph Auer
598c9c53d4
fix: Secure torch model inits with global locks ( #1884 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-04 07:27:26 +02:00
Qiefan Jiang
13865c06f5
perf(msexcel): _find_table_bounds use iter_rows/iter_cols instead of Worksheet.cell ( #1875 )
...
* perf(msexcel): _find_table_bounds use iter_rows/iter_cols instead of sheet.cell
* DCO Remediation Commit for Qiefan Jiang <jiangqiefan@bytedance.com >
I, Qiefan Jiang <jiangqiefan@bytedance.com >, hereby add my Signed-off-by to this commit: 274102a8d4
Signed-off-by: Qiefan Jiang <jiangqiefan@bytedance.com >
* fix lint
* DCO Remediation Commit for Qiefan Jiang <jiangqiefan@bytedance.com >
I, Qiefan Jiang <jiangqiefan@bytedance.com >, hereby add my Signed-off-by to this commit: b6b5b090a9
Signed-off-by: Qiefan Jiang <jiangqiefan@bytedance.com >
---------
Signed-off-by: Qiefan Jiang <jiangqiefan@bytedance.com >
2025-07-03 13:12:06 +02:00
William Easton
3089cf2d26
perf: Move expensive imports closer to usage ( #1863 )
...
* Move expensive imports closer to usage
Signed-off-by: William Easton <bill.easton@elastic.co >
* DCO Remediation Commit for William Easton <bill.easton@elastic.co >
I, William Easton <bill.easton@elastic.co >, hereby add my Signed-off-by to this commit: 8a7412ce5bb131a01bb6403067aeb948c9093b0b
Signed-off-by: William Easton <bill.easton@elastic.co >
* formatting fixes
Signed-off-by: William Easton <bill.easton@elastic.co >
* DCO Remediation Commit for William Easton <bill.easton@elastic.co >
I, William Easton <bill.easton@elastic.co >, hereby add my Signed-off-by to this commit: 8a7412ce5bb131a01bb6403067aeb948c9093b0b
I, William Easton <bill.easton@elastic.co >, hereby add my Signed-off-by to this commit: 963e34325071db5e844841f10c27b396a054a0a1
Signed-off-by: William Easton <bill.easton@elastic.co >
* Fix baseocrmodel test issue
Signed-off-by: William Easton <bill.easton@elastic.co >
---------
Signed-off-by: William Easton <bill.easton@elastic.co >
2025-07-01 22:27:17 +02:00
Christoph Auer
56a0e104f7
feat: Integrate ListItemMarkerProcessor into document assembly ( #1825 )
...
* Integrate ListItemMarkerProcessor into document assembly
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update to final version
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Update all test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Upgrade deps
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-07-01 10:04:58 +02:00
Christoph Auer
bdfee4e2d0
chore: Safer unloading of DPv4 backend ( #1867 )
...
fix: Safer unloading of DPv4 backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-30 14:41:21 +02:00
Nikos Livathinos
ae39a9411a
fix: Ensure that TesseractOcrModel does not crash in case OSD is not installed ( #1866 )
...
fix: Ensure that TesseractOcrModel does not crash if tesseract OSD is not installed
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com >
2025-06-30 10:55:56 +02:00
github-actions[bot]
bb99be6c24
chore: bump version to 2.39.0 [skip ci]
v2.39.0
2025-06-27 15:37:53 +00:00
Panos Vagenas
0533da1923
feat: leverage new list modeling, capture default markers ( #1856 )
...
* chore: update docling-core & regenerate test data
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update backends to leverage new list modeling
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* repin docling-core
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* ensure availability of latest docling-core API
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-06-27 16:37:15 +02:00
Michael Honaker
e79e4f0ab6
fix(markdown): make parsing of rich table cells valid ( #1821 )
...
* fix: update md table classification
Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com >
* Fix ground truth header changes
Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com >
* Fix merge issues
Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com >
* Fix minor ground truth errors
Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com >
---------
Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com >
2025-06-26 19:50:45 +02:00
github-actions[bot]
ee4781075a
chore: bump version to 2.38.1 [skip ci]
v2.38.1
2025-06-25 16:27:46 +00:00
pranaymiri
d337825b8e
fix: updated granite vision model version for picture description ( #1852 )
...
* updated granite model version
* DCO Remediation Commit for Miriyala Pranay <miriyalapranay146@gmail.com >
I, Miriyala Pranay <miriyalapranay146@gmail.com >, hereby add my Signed-off-by to this commit: 5de0d5034c
Signed-off-by: Miriyala Pranay <miriyalapranay146@gmail.com >
---------
Signed-off-by: Miriyala Pranay <miriyalapranay146@gmail.com >
2025-06-25 17:49:56 +02:00
Panos Vagenas
7c5614a37a
fix(markdown): fix single-formatted headings & list items ( #1820 )
...
* fix(markdown): fix formatting & inline edge cases (show behavior before change)
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* add change and updated test data
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* update lock
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
* improve test case
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
---------
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-06-25 13:05:06 +02:00
Michele Dolfi
41e8cae26b
fix: fix response type of ollama ( #1850 )
...
fix response type of ollama
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-06-25 11:33:09 +02:00
Allen N.
4002de1f92
fix: Handle missing runs to avoid out of range exception ( #1844 )
...
Fixes #1681 on upstream
Signed-off-by: Allen Nikka <allennikka@gmail.com >
2025-06-25 07:55:27 +02:00
github-actions[bot]
1dc63d0aa9
chore: bump version to 2.38.0 [skip ci]
v2.38.0
2025-06-23 18:14:24 +00:00
Peter W. J. Staar
f3ae3029b8
docs: update readme and add ASR example ( #1836 )
...
* updated the README
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added minimal_asr_pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* Updated README and added ASR example
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* Updated docs.index.md
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updated CI and mkdocs
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added link tp existing audio file
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added link tp existing audio file
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* reformatting
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
2025-06-23 18:55:16 +02:00
Peter W. J. Staar
1557e7ce3e
feat: Support audio input ( #1763 )
...
* scaffolding in place
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* doing scaffolding for audio pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* WIP: got first transcription working
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* all working, time to start cleaning up
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* first working ASR pipeline
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* added openai-whisper as a first transcription model
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* updating with asr_options
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* finalised the first working ASR pipeline with Whisper
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* use whisper from the latest git commit
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
* Update docling/datamodel/pipeline_options.py
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
* updated comment
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
* AudioBackend -> DummyBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* file rename
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Rename to NoOpBackend, add test for ASR pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Support every format in NoOpBackend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Add missing audio file and test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
* Install ffmpeg system dependency for ASR test
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
---------
Signed-off-by: Peter Staar <taa@zurich.ibm.com >
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com >
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com >
Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com >
Co-authored-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-23 14:47:26 +02:00
Cesar Berrospi Ramis
d26dac61a8
fix(docx): ensure list items have a list parent ( #1827 )
...
Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com >
2025-06-20 14:47:25 +02:00
mkrssg
1350a8d3e5
fix(msword_backend): Identify text in the same line after an image #1425 ( #1610 )
...
* fix(msword_backend): Identify text in the same line after an image / image anchor #1425
Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com >
* test: add test file and case for fix(msword_backend): Identify text in the same line after an image / image anchor #1425
Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com >
* test: added groundtruth test files for fix(msword_backend): Identify text in the same line after an image / image anchor #1425
Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com >
* fix: extraneous empty paragraphs for test files
Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com >
---------
Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com >
Co-authored-by: Michael Krissgau <michael.krissgau@ibm.com >
2025-06-20 10:55:30 +02:00
Michele Dolfi
64ac043786
docs: support running examples from root or subfolder ( #1816 )
...
support running examples from root or subfolder
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-06-19 11:10:40 +02:00
Christoph Auer
dd7f64ff28
fix: Ensure uninitialized pages are removed before assembling document ( #1812 )
...
Ensure uninitialized pages are removed before assembling document
Signed-off-by: Christoph Auer <cau@zurich.ibm.com >
2025-06-19 07:33:25 +02:00
Panos Vagenas
861abcdcb0
feat(markdown): add formatting & improve inline support ( #1804 )
...
feat(markdown): support formatting & hyperlinks
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com >
2025-06-18 15:57:57 +02:00
Shkarupa Alex
215b540f6c
feat: Maximum image size for Vlm models ( #1802 )
...
* Image scale moved to base vlm options.
Added max_size image limit (options and vlm models).
* DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com >
I, Shkarupa Alex <shkarupa.alex@gmail.com >, hereby add my Signed-off-by to this commit: e93602a0d0
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
---------
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com >
2025-06-18 12:57:37 +02:00
Mahafuzur Rahman
dbab30e92c
fix: formula conversion with page_range param set ( #1791 )
...
When page_range param is used for formula conversion,
the system throws list index out of range error.
Included tests to validate that the fix works.
Signed-off-by: Masum <masumsofts@yahoo.com >
2025-06-17 13:58:45 +02:00
Michele Dolfi
c2ef69718a
chore: dco advisor ( #1795 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com >
2025-06-17 09:45:56 +02:00