Commit Graph

  • 033f504a82 Update all test cases again (2) Christoph Auer 2025-06-20 16:57:30 +0200
  • a6efb2eb3d Merge branch 'main' of github.com:DS4SD/docling into cau/dp4-test-diff Christoph Auer 2025-06-20 16:51:35 +0200
  • 6158a2e784 Update all test cases again Christoph Auer 2025-06-20 14:56:46 +0200
  • d26dac61a8
    fix(docx): ensure list items have a list parent (#1827) Cesar Berrospi Ramis 2025-06-20 14:47:25 +0200
  • 7d96f68988 fix(docx): ensure list items have a list parent Cesar Berrospi Ramis 2025-06-20 11:30:45 +0200
  • c146c8f309 Update all test cases Christoph Auer 2025-06-20 13:31:28 +0200
  • 926e32037d Update to final version Christoph Auer 2025-06-20 11:42:35 +0200
  • 1350a8d3e5
    fix(msword_backend): Identify text in the same line after an image #1425 (#1610) mkrssg 2025-06-20 10:55:30 +0200
  • 48ee8a1291 Integrate ListItemMarkerProcessor into document assembly Christoph Auer 2025-06-20 10:28:59 +0200
  • 90da15f611 initial reference to granite-doclong dev/add-granite-docling-preview Peter Staar 2025-06-20 07:47:12 +0200
  • 408b03ebbc updated comment Peter Staar 2025-06-20 05:06:37 +0200
  • a866392158 fix: extraneous empty paragraphs for test files Michael Krissgau 2025-06-19 21:36:39 +0200
  • 6a34f6f5c5 fix: update md table classification Michael Honaker 2025-06-18 18:12:24 -0700
  • 69b271d09c
    Update docling/datamodel/pipeline_options.py Peter W. J. Staar 2025-06-19 17:25:21 +0200
  • d7d5512e1b
    Update docling/datamodel/pipeline_options.py Peter W. J. Staar 2025-06-19 17:25:10 +0200
  • 0e63cb09e6 Remove pages.json from diff Christoph Auer 2025-06-19 16:08:07 +0200
  • 64ac043786
    docs: support running examples from root or subfolder (#1816) Michele Dolfi 2025-06-19 04:10:40 -0500
  • fb1009e2eb support running examples from root or subfolder Michele Dolfi 2025-06-19 09:51:49 +0200
  • 4e332500a8 add table raw cells when no table structure model was used fix-print-raw-table Michele Dolfi 2025-06-19 09:12:08 +0200
  • dd7f64ff28
    fix: Ensure uninitialized pages are removed before assembling document (#1812) Christoph Auer 2025-06-19 07:33:25 +0200
  • 4bd0ddbaf5 Ensure uninitialized pages are removed before assembling document Christoph Auer 2025-06-18 16:56:44 +0200
  • 2eb019e14c
    Fix: Hard‑coded fallback (add_level = 1) and Off‑by‑one for numbered headings in msword_backend.py Artus Krohn-Grimberghe 2025-06-18 16:05:58 +0200
  • 861abcdcb0
    feat(markdown): add formatting & improve inline support (#1804) Panos Vagenas 2025-06-18 15:57:57 +0200
  • 5501dc5725
    Fix: inconsistencies between file format backends Artus Krohn-Grimberghe 2025-06-18 15:49:25 +0200
  • 42af299fa2
    Fix: inconsistency in how heading levels are calculated in the msword_backend.py file compared to the AsciiDoc, HTML backends Artus Krohn-Grimberghe 2025-06-18 15:37:14 +0200
  • aa33bbe1a2 use whisper from the latest git commit Michele Dolfi 2025-06-18 14:27:38 +0200
  • 3829e9d9ce DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> Shkarupa Alex 2025-06-18 14:58:56 +0300
  • 9c595d574f Dynamic prompt support with example Shkarupa Alex 2025-06-18 14:48:32 +0300
  • 34d446cb98 Unify temperature options for Vlm models Shkarupa Alex 2025-06-18 14:01:25 +0300
  • 215b540f6c
    feat: Maximum image size for Vlm models (#1802) Shkarupa Alex 2025-06-18 13:57:37 +0300
  • b7f8ce029b feat(markdown): support formatting & hyperlinks Panos Vagenas 2025-06-18 10:04:09 +0200
  • 43239ff712 finalised the first working ASR pipeline with Whisper Peter Staar 2025-06-18 06:50:10 +0200
  • ed10d09936 updating with asr_options Peter Staar 2025-06-18 06:23:33 +0200
  • e16c7d9d74 DCO Remediation Commit for Shkarupa Alex <shkarupa.alex@gmail.com> Shkarupa Alex 2025-06-17 23:55:40 +0300
  • e93602a0d0 Image scale moved to base vlm options. Added max_size image limit (options and vlm models). Shkarupa Alex 2025-06-17 23:42:52 +0300
  • e5fd579861 added openai-whisper as a first transcription model Peter Staar 2025-06-17 16:52:34 +0200
  • dbab30e92c
    fix: formula conversion with page_range param set (#1791) Mahafuzur Rahman 2025-06-17 17:58:45 +0600
  • c2ef69718a
    chore: dco advisor (#1795) Michele Dolfi 2025-06-17 02:45:56 -0500
  • edc0535b89 chore: dco advisor Michele Dolfi 2025-06-17 08:40:34 +0200
  • 448c932fd2 reformat Georg Heiler 2025-06-17 08:23:39 +0200
  • 352f261163 rename Georg Heiler 2025-06-17 00:03:24 +0200
  • 1c3699eaf7 feat(dolphin): add dolphin support Georg Heiler 2025-06-14 09:00:11 +0200
  • a0f4805097 fix: formula conversion with page_range param set Masum 2025-06-16 23:47:40 +0600
  • cbd2e535db first working ASR pipeline Peter Staar 2025-06-16 19:06:47 +0200
  • 7bae3b6c06 chore: bump version to 2.37.0 [skip ci] v2.37.0 github-actions[bot] 2025-06-16 11:02:54 +0000
  • 8d4ac70f61
    docs: add links to other language versions of README neo 2025-06-16 17:33:44 +0800
  • f28d23cf03
    fix: pptx line break and space handling (#1664) Martin Wind 2025-06-16 10:44:30 +0200
  • a835d10526
    Merge c75b75e8af into b886e4df31 Martin Wind 2025-06-16 10:39:05 +0200
  • b886e4df31
    fix(asciidoc): set default size when missing in image directive (#1769) Cesar Berrospi Ramis 2025-06-16 10:38:46 +0200
  • 7d3302cb48
    feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) Christoph Auer 2025-06-13 19:01:55 +0200
  • f8b8cc9654 fix(asciidoc): set default size when missing in image directive Cesar Berrospi Ramis 2025-06-13 18:02:43 +0200
  • bf1599b2c3 Fix typing Christoph Auer 2025-06-13 16:16:12 +0200
  • 5ec6de3ae4 Remove with pypdfium2_lock from caller sites Christoph Auer 2025-06-13 16:11:34 +0200
  • 1c39dc93ab One more test fix Christoph Auer 2025-06-13 16:01:26 +0200
  • 05b8485dfb all working, time to start cleaning up Peter Staar 2025-06-13 11:10:32 +0200
  • d99080e036 Merge branch 'main' of github.com:DS4SD/docling into cau/ocr-cells-in-segmented-page Christoph Auer 2025-06-13 11:10:00 +0200
  • e3cefd0e71 Add type hints and fix mypy Christoph Auer 2025-06-13 11:08:55 +0200
  • 6dead88464 WIP: got first transcription working Peter Staar 2025-06-13 10:43:23 +0200
  • 1d4008ac7c doing scaffolding for audio pipeline Peter Staar 2025-06-12 18:24:13 +0200
  • 5c606c2574 scaffolding in place Peter Staar 2025-06-12 17:57:29 +0200
  • 0432a31b2f
    docs: update vlm models api examples with LM Studio (#1759) Michele Dolfi 2025-06-12 05:58:44 -0500
  • b1fb7c3b08 update vlm models api examples Michele Dolfi 2025-06-11 23:14:51 -0400
  • 9469280802 Use different OCR engine order Christoph Auer 2025-06-11 15:59:47 +0200
  • 9752e824fb Correctly compute PDF boxes from pymupdf Christoph Auer 2025-06-11 14:19:55 +0200
  • 06b408fa41 Small fix Christoph Auer 2025-06-10 20:01:01 +0200
  • d73c9a2995 Make page.parsed_page the only source of truth for text cells Christoph Auer 2025-06-10 19:55:49 +0200
  • 7a275c7637
    fix: Handle NoneType error in MsPowerpointDocumentBackend (#1747) Bruno Rigal 2025-06-10 19:43:20 +0200
  • df140227c3
    feat: support xlsm files (#1520) Ayraf 2025-06-10 20:25:59 +0530
  • 3dd909b39a fix:nonetyperror in pptx backend Bruno Rigal 2025-06-10 14:18:26 +0000
  • dba8317ce5 Merge branch 'bri/debug_pptx_nonetype_in_notes_slide' of https://github.com/brunorigal/docling into bri/debug_pptx_nonetype_in_notes_slide Bruno Rigal 2025-06-10 14:08:19 +0000
  • 5a0173c385 linting Bruno Rigal 2025-06-10 14:05:33 +0000
  • 1b401a4730 handle none notes_text_frame Bruno Rigal 2025-06-10 13:13:46 +0000
  • 8f105be7d0 handle none notes_text_frame Bruno Rigal 2025-06-10 13:13:46 +0000
  • e310c5cff3 Keep page.parsed_page.textline_cells and page.cells in sync, including OCR Christoph Auer 2025-06-10 15:05:56 +0200
  • 3c922b4105 Fix tests, upgrade XSLM example to a valid file Christoph Auer 2025-06-10 13:14:32 +0200
  • 000e7aa1ca Merge branch 'main' of github.com:docling-project/docling Christoph Auer 2025-06-10 12:55:38 +0200
  • 6613b9e98b
    fix: prov for merged-elems (#1728) Peter W. J. Staar 2025-06-10 11:22:42 +0200
  • e979750ce9
    fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718) Maras Ioannis 2025-06-10 11:57:45 +0300
  • f7f31137f1
    fix: allow custom torch_dtype in vlm models (#1735) Michele Dolfi 2025-06-10 03:52:15 -0500
  • 40059a1734 Fix tests Christoph Auer 2025-06-10 10:23:21 +0200
  • d9e632f426 Merge branch 'main' of github.com:DS4SD/docling into fix/fix-prov-for-merged-elems Christoph Auer 2025-06-10 10:14:42 +0200
  • 5b92e23697 Satisfy mypy, regenerate OCR tests Christoph Auer 2025-06-10 10:12:55 +0200
  • b559576f2e Merge branch 'main' of github.com:docling-project/docling Christoph Auer 2025-06-10 10:10:52 +0200
  • 3a76433b83 Update test files dev/fix_msword_backend_identify_text_after_image Christoph Auer 2025-06-10 09:52:31 +0200
  • 5fac357995 Merge branch 'main' of github.com:docling-project/docling into dev/fix_msword_backend_identify_text_after_image Christoph Auer 2025-06-10 09:52:15 +0200
  • 49b10e7419
    docs: add open webui (#1734) Michele Dolfi 2025-06-10 02:35:20 -0500
  • 2c4fd158b4 Reset pyproject.toml Christoph Auer 2025-06-10 09:33:17 +0200
  • 5db127b2e8 fix: allow custom torch_dtype in vlm models Michele Dolfi 2025-06-08 16:50:57 +0200
  • 4fea44fedf docs: add open webui Michele Dolfi 2025-06-08 16:11:04 +0200
  • b105ecc9d5
    Merge branch 'docling-project:main' into main Ayraf 2025-06-07 17:46:13 +0530
  • 20dd65fb94 reformatted the code Peter Staar 2025-06-07 07:40:35 +0200
  • 6a02ec0f02 fix: prov for merged-elems Peter Staar 2025-06-07 07:31:54 +0200
  • 52b8b9163f Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-06-06 20:53:40 +0200
  • 50cbea4483
    Update md_backend.py Leonid Fedotov 2025-06-06 16:59:05 +0300
  • 04137faf22
    default-groups expects a sequence, not a string Felix Hofmann 2025-06-06 13:41:44 +0200
  • ca154ac8ae
    docs: update export table function args Giannis Manousaridis 2025-06-06 13:43:36 +0300
  • 9dbcb3d7d4
    fix: Improve extraction from textboxes in Word docs (#1701) AndrewTsai0406 2025-06-06 17:37:46 +0800
  • 9eb05b54a8
    Fix formatting Christoph Auer 2025-06-06 11:36:49 +0200
  • 2bc564ccef Merge branch 'main' of https://github.com/docling-project/docling into dev/fix_msword_backend_identify_text_after_image Michael Krissgau 2025-06-05 22:20:09 +0200
  • d8d21489e0 Merge branch 'bug_1242/drawing_blip_fix' of https://github.com/benichou/docling into bug_1242/drawing_blip_fix Merging remote to local Benichou 2025-06-05 11:29:09 -0400