docling

mirror of https://github.com/DS4SD/docling.git synced 2025-12-08 12:48:28 +00:00

Author	SHA1	Message	Date
github-actions[bot]	6c4bf9d087	chore: bump version to 2.41.0 [skip ci] v2.41.0	2025-07-10 14:25:05 +00:00
Christoph Auer	cc6193b3b9	test: Update tests to use default PDF backend (DPv4) (#1923 ) * Update tests to use default PDF backend (DPv4) Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * OCR tests use DPv1 until rotation bugs are fixed Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-10 15:16:56 +02:00
Christoph Auer	2b8616d6d5	feat: Layout model specification and multiple choices (#1910 ) * Establish layout_model spec and example instantations Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Updated naming Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Back to uppercase constants Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * fix deps issue with openai-whipser>numba>llvmlite Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Pull v1 changed test GT from main Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-10 06:37:27 +02:00
Panos Vagenas	ec588df971	feat: enable precision control in float serialization (#1914 ) * chore: propagate precision control in float serialization Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * parametrize float serialization, propagate core updates Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update test float precision Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * repin docling-core Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-07-09 16:39:17 +02:00
Clément Doumouro	931eb55b88	fix(ocr-utils): unit test and fix the `rotate_bounding_box` function (#1897 ) Signed-off-by: Clément Doumouro <clement.doumouro@gmail.com>	2025-07-08 18:03:29 +02:00
geoHeil	a07ba863c4	feat: add image-text-to-text models in transformers (#1772 ) * feat(dolphin): add dolphin support Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * rename Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * reformat Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * fix mypy Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> * add prompt style and examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> --------- Signed-off-by: Georg Heiler <georg.kf.heiler@gmail.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-08 05:54:57 +02:00
VIktor Kuropiantnyk	e25873d557	fix: docs are missing osd packages for tesseract on RHEL (#1905 ) Fixed missing packages in the docs on tesseract Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-07-07 17:06:26 +02:00
Shkarupa Alex	b8813eea80	feat(vlm): Dynamic prompts (#1808 ) * Unify temperature options for Vlm models * Dynamic prompt support with example * DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `34d446cb98` I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `9c595d574f` Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * Replace Page with SegmentedPage * Fix example HF repo link Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Sign-off Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `1a162066dd` Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> * Use lmstudio-community model Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Swap inference engine to LM Studio Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> --------- Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>	2025-07-07 16:58:42 +02:00
Michele Dolfi	edd4356aac	fix: use only backend for picture classifier (#1904 ) use backend for picture classifier Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-07 16:23:16 +02:00
Michele Dolfi	dd8fde7f19	fix: typo in asr options (#1902 ) fix typo Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-07 08:59:14 +02:00
github-actions[bot]	f4a1c06937	chore: bump version to 2.40.0 [skip ci] v2.40.0	2025-07-04 15:31:36 +00:00
Christoph Auer	ec6cf6f7e8	feat: Introduce LayoutOptions to control layout postprocessing behaviour (#1870 ) Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-04 15:36:13 +02:00
Christoph Auer	598c9c53d4	fix: Secure torch model inits with global locks (#1884 ) Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-04 07:27:26 +02:00
Qiefan Jiang	13865c06f5	perf(msexcel): _find_table_bounds use iter_rows/iter_cols instead of Worksheet.cell (#1875 ) * perf(msexcel): _find_table_bounds use iter_rows/iter_cols instead of sheet.cell * DCO Remediation Commit for Qiefan Jiang <jiangqiefan@bytedance.com> I, Qiefan Jiang <jiangqiefan@bytedance.com>, hereby add my Signed-off-by to this commit: `274102a8d4` Signed-off-by: Qiefan Jiang <jiangqiefan@bytedance.com> * fix lint * DCO Remediation Commit for Qiefan Jiang <jiangqiefan@bytedance.com> I, Qiefan Jiang <jiangqiefan@bytedance.com>, hereby add my Signed-off-by to this commit: `b6b5b090a9` Signed-off-by: Qiefan Jiang <jiangqiefan@bytedance.com> --------- Signed-off-by: Qiefan Jiang <jiangqiefan@bytedance.com>	2025-07-03 13:12:06 +02:00
William Easton	3089cf2d26	perf: Move expensive imports closer to usage (#1863 ) * Move expensive imports closer to usage Signed-off-by: William Easton <bill.easton@elastic.co> * DCO Remediation Commit for William Easton <bill.easton@elastic.co> I, William Easton <bill.easton@elastic.co>, hereby add my Signed-off-by to this commit: 8a7412ce5bb131a01bb6403067aeb948c9093b0b Signed-off-by: William Easton <bill.easton@elastic.co> * formatting fixes Signed-off-by: William Easton <bill.easton@elastic.co> * DCO Remediation Commit for William Easton <bill.easton@elastic.co> I, William Easton <bill.easton@elastic.co>, hereby add my Signed-off-by to this commit: 8a7412ce5bb131a01bb6403067aeb948c9093b0b I, William Easton <bill.easton@elastic.co>, hereby add my Signed-off-by to this commit: 963e34325071db5e844841f10c27b396a054a0a1 Signed-off-by: William Easton <bill.easton@elastic.co> * Fix baseocrmodel test issue Signed-off-by: William Easton <bill.easton@elastic.co> --------- Signed-off-by: William Easton <bill.easton@elastic.co>	2025-07-01 22:27:17 +02:00
Christoph Auer	56a0e104f7	feat: Integrate ListItemMarkerProcessor into document assembly (#1825 ) * Integrate ListItemMarkerProcessor into document assembly Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update to final version Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Update all test cases Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Upgrade deps Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-07-01 10:04:58 +02:00
Christoph Auer	bdfee4e2d0	chore: Safer unloading of DPv4 backend (#1867 ) fix: Safer unloading of DPv4 backend Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-30 14:41:21 +02:00
Nikos Livathinos	ae39a9411a	fix: Ensure that TesseractOcrModel does not crash in case OSD is not installed (#1866 ) fix: Ensure that TesseractOcrModel does not crash if tesseract OSD is not installed Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>	2025-06-30 10:55:56 +02:00
github-actions[bot]	bb99be6c24	chore: bump version to 2.39.0 [skip ci] v2.39.0	2025-06-27 15:37:53 +00:00
Panos Vagenas	0533da1923	feat: leverage new list modeling, capture default markers (#1856 ) * chore: update docling-core & regenerate test data Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update backends to leverage new list modeling Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * repin docling-core Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * ensure availability of latest docling-core API Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-06-27 16:37:15 +02:00
Michael Honaker	e79e4f0ab6	fix(markdown): make parsing of rich table cells valid (#1821 ) * fix: update md table classification Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com> * Fix ground truth header changes Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com> * Fix merge issues Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com> * Fix minor ground truth errors Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com> --------- Signed-off-by: Michael Honaker <Michael.Honaker@ibm.com>	2025-06-26 19:50:45 +02:00
github-actions[bot]	ee4781075a	chore: bump version to 2.38.1 [skip ci] v2.38.1	2025-06-25 16:27:46 +00:00
pranaymiri	d337825b8e	fix: updated granite vision model version for picture description (#1852 ) * updated granite model version * DCO Remediation Commit for Miriyala Pranay <miriyalapranay146@gmail.com> I, Miriyala Pranay <miriyalapranay146@gmail.com>, hereby add my Signed-off-by to this commit: `5de0d5034c` Signed-off-by: Miriyala Pranay <miriyalapranay146@gmail.com> --------- Signed-off-by: Miriyala Pranay <miriyalapranay146@gmail.com>	2025-06-25 17:49:56 +02:00
Panos Vagenas	7c5614a37a	fix(markdown): fix single-formatted headings & list items (#1820 ) * fix(markdown): fix formatting & inline edge cases (show behavior before change) Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * add change and updated test data Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * update lock Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> * improve test case Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-06-25 13:05:06 +02:00
Michele Dolfi	41e8cae26b	fix: fix response type of ollama (#1850 ) fix response type of ollama Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-25 11:33:09 +02:00
Allen N.	4002de1f92	fix: Handle missing runs to avoid out of range exception (#1844 ) Fixes #1681 on upstream Signed-off-by: Allen Nikka <allennikka@gmail.com>	2025-06-25 07:55:27 +02:00
github-actions[bot]	1dc63d0aa9	chore: bump version to 2.38.0 [skip ci] v2.38.0	2025-06-23 18:14:24 +00:00
Peter W. J. Staar	f3ae3029b8	docs: update readme and add ASR example (#1836 ) * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added minimal_asr_pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Updated README and added ASR example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Updated docs.index.md Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated CI and mkdocs Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added link tp existing audio file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added link tp existing audio file Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatting Signed-off-by: Peter Staar <taa@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2025-06-23 18:55:16 +02:00
Peter W. J. Staar	1557e7ce3e	feat: Support audio input (#1763 ) * scaffolding in place Signed-off-by: Peter Staar <taa@zurich.ibm.com> * doing scaffolding for audio pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * WIP: got first transcription working Signed-off-by: Peter Staar <taa@zurich.ibm.com> * all working, time to start cleaning up Signed-off-by: Peter Staar <taa@zurich.ibm.com> * first working ASR pipeline Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added openai-whisper as a first transcription model Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updating with asr_options Signed-off-by: Peter Staar <taa@zurich.ibm.com> * finalised the first working ASR pipeline with Whisper Signed-off-by: Peter Staar <taa@zurich.ibm.com> * use whisper from the latest git commit Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> * Update docling/datamodel/pipeline_options.py Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com> * Update docling/datamodel/pipeline_options.py Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com> * updated comment Signed-off-by: Peter Staar <taa@zurich.ibm.com> * AudioBackend -> DummyBackend Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * file rename Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Rename to NoOpBackend, add test for ASR pipeline Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Support every format in NoOpBackend Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add missing audio file and test Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Install ffmpeg system dependency for ASR test Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Peter W. J. Staar <91719829+PeterStaar-IBM@users.noreply.github.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-23 14:47:26 +02:00
Cesar Berrospi Ramis	d26dac61a8	fix(docx): ensure list items have a list parent (#1827 ) Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-06-20 14:47:25 +02:00
mkrssg	1350a8d3e5	fix(msword_backend): Identify text in the same line after an image #1425 (#1610 ) * fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com> * test: add test file and case for fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com> * test: added groundtruth test files for fix(msword_backend): Identify text in the same line after an image / image anchor #1425 Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com> * fix: extraneous empty paragraphs for test files Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com> --------- Signed-off-by: Michael Krissgau <michael.krissgau@ibm.com> Co-authored-by: Michael Krissgau <michael.krissgau@ibm.com>	2025-06-20 10:55:30 +02:00
Michele Dolfi	64ac043786	docs: support running examples from root or subfolder (#1816 ) support running examples from root or subfolder Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-19 11:10:40 +02:00
Christoph Auer	dd7f64ff28	fix: Ensure uninitialized pages are removed before assembling document (#1812 ) Ensure uninitialized pages are removed before assembling document Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-19 07:33:25 +02:00
Panos Vagenas	861abcdcb0	feat(markdown): add formatting & improve inline support (#1804 ) feat(markdown): support formatting & hyperlinks Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>	2025-06-18 15:57:57 +02:00
Shkarupa Alex	215b540f6c	feat: Maximum image size for Vlm models (#1802 ) * Image scale moved to base vlm options. Added max_size image limit (options and vlm models). * DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> I, Shkarupa Alex <shkarupa.alex@gmail.com>, hereby add my Signed-off-by to this commit: `e93602a0d0` Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com> --------- Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>	2025-06-18 12:57:37 +02:00
Mahafuzur Rahman	dbab30e92c	fix: formula conversion with page_range param set (#1791 ) When page_range param is used for formula conversion, the system throws list index out of range error. Included tests to validate that the fix works. Signed-off-by: Masum <masumsofts@yahoo.com>	2025-06-17 13:58:45 +02:00
Michele Dolfi	c2ef69718a	chore: dco advisor (#1795 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-17 09:45:56 +02:00
github-actions[bot]	7bae3b6c06	chore: bump version to 2.37.0 [skip ci] v2.37.0	2025-06-16 11:02:54 +00:00
Martin Wind	f28d23cf03	fix: pptx line break and space handling (#1664 ) Signed-off-by: Martin Wind <martin.wind@im-c.at>	2025-06-16 10:44:30 +02:00
Cesar Berrospi Ramis	b886e4df31	fix(asciidoc): set default size when missing in image directive (#1769 ) The AsciiDoc backend should not create an ImageRef with Size equal to None, instead use default size values. Refactor static methods as such and add the staticmethod decorator. Extend the regression test for this fix. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-06-16 10:38:46 +02:00
Christoph Auer	7d3302cb48	feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745 ) * Keep page.parsed_page.textline_cells and page.cells in sync, including OCR Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Make page.parsed_page the only source of truth for text cells Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Small fix Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Correctly compute PDF boxes from pymupdf Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Use different OCR engine order Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add type hints and fix mypy Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * One more test fix Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove with pypdfium2_lock from caller sites Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix typing Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-13 19:01:55 +02:00
Michele Dolfi	0432a31b2f	docs: update vlm models api examples with LM Studio (#1759 ) update vlm models api examples Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-12 12:58:44 +02:00
Bruno Rigal	7a275c7637	fix: Handle NoneType error in MsPowerpointDocumentBackend (#1747 ) fix:nonetyperror in pptx backend Signed-off-by: Bruno Rigal <bruno.rigal@probayes.com> Co-authored-by: Bruno Rigal <bruno.rigal@probayes.com>	2025-06-10 19:43:20 +02:00
Ayraf	df140227c3	feat: support xlsm files (#1520 ) * code for xlsm support * updated support for xlsm * updated code for xlsm support * Update docling_parse_v4_backend.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Update docling_parse_v4_backend.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Update test_backend_msexcel_xlsm.py updated the tests/test_backend_msexcel_xlsm.py: have a function starting with test removed all print statements ** To add an explicit assert {test}=={pred} Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Update base_models.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Update test_backend_msexcel.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Update test_backend_msexcel_xlsm.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Update document_converter.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * Delete tests/test_backend_msexcel_xlsm.py Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * xlsm file Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> * run tests * ran tests * Fix tests, upgrade XSLM example to a valid file Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: ShiroYasha18 <85089952+ShiroYasha18@users.noreply.github.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-10 16:55:59 +02:00
Peter W. J. Staar	6613b9e98b	fix: prov for merged-elems (#1728 ) * fix: prov for merged-elems Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * Reset pyproject.toml Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fix tests Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-10 11:22:42 +02:00
Maras Ioannis	e979750ce9	fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718 ) * fix: initialize df_osd to avoid uninitialized variable error Signed-off-by: IoannisMaras <maras2002@gmail.com> * Fix formatting Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> * Satisfy mypy, regenerate OCR tests Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: IoannisMaras <maras2002@gmail.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Christoph Auer <60343111+cau-git@users.noreply.github.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com>	2025-06-10 10:57:45 +02:00
Michele Dolfi	f7f31137f1	fix: allow custom torch_dtype in vlm models (#1735 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-10 10:52:15 +02:00
Michele Dolfi	49b10e7419	docs: add open webui (#1734 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-10 09:35:20 +02:00
AndrewTsai0406	9dbcb3d7d4	fix: Improve extraction from textboxes in Word docs (#1701 ) * fix/docx_text_box_extraction Signed-off-by: JiunAn Tsai <andrew@JiunAns-Mac-mini.local> * fix/docx_text_box_extraction Signed-off-by: JiunAn Tsai <andrew@JiunAns-Mac-mini.local> --------- Signed-off-by: JiunAn Tsai <andrew@JiunAns-Mac-mini.local> Co-authored-by: JiunAn Tsai <andrew@JiunAns-Mac-mini.local>	2025-06-06 11:37:46 +02:00
Eugene	a2b83fe4ae	fix: Add WEBP to the list of image file extensions (#1711 ) feat: Add WEBP to the list of image file extensions Signed-off-by: Eugene <fogaprod@gmail.com>	2025-06-05 09:09:27 +02:00

1 2 3 4 5 ...

570 Commits