Commit Graph

  • 3089cf2d26
    perf: Move expensive imports closer to usage (#1863) William Easton 2025-07-01 15:27:17 -0500
  • bab9c25c0c
    Fix baseocrmodel test issue William Easton 2025-07-01 10:43:30 -0500
  • a2eafdbac3
    DCO Remediation Commit for William Easton <bill.easton@elastic.co> William Easton 2025-06-30 15:25:25 -0500
  • 66f486dfd1
    formatting fixes William Easton 2025-06-30 08:32:40 -0500
  • c792447e9c
    DCO Remediation Commit for William Easton <bill.easton@elastic.co> William Easton 2025-06-28 22:18:35 -0500
  • 08e0bdd6ab
    Move expensive imports closer to usage William Easton 2025-06-28 22:14:21 -0500
  • d90442488c DCO Remediation Commit for Qiefan Jiang <jiangqiefan@bytedance.com> Qiefan Jiang 2025-07-01 19:45:43 +0800
  • ca391f4908 feat(msexcel): ignore invisible sheet Qiefan Jiang 2025-07-01 19:38:29 +0800
  • 37a0bf8e4c DCO Remediation Commit for Qiefan Jiang <jiangqiefan@bytedance.com> Qiefan Jiang 2025-07-01 19:18:32 +0800
  • 274102a8d4 perf(msexcel): _find_table_bounds use iter_rows/iter_cols instead of sheet.cell Qiefan Jiang 2025-07-01 18:49:52 +0800
  • 8498d37b6b
    Merge pull request #3 from Mirza-Samad-Ahmed-Baig/main MirzaSamad20 2025-07-01 15:29:02 +0500
  • e0435833fd DCO Remediation Commit for mirza-samad-ahmed-baig <mirzasamadahmedbaig@gmail.com> mirza-samad-ahmed-baig 2025-07-01 15:24:02 +0500
  • dc182a1e0c
    Merge pull request #1 from Mirza-Samad-Ahmed-Baig/main MirzaSamad20 2025-07-01 14:56:59 +0500
  • 9ba627b40a Refactor: Address minor code quality issues and remove deprecated features mirza-samad-ahmed-baig 2025-07-01 14:13:36 +0500
  • 56a0e104f7
    feat: Integrate ListItemMarkerProcessor into document assembly (#1825) Christoph Auer 2025-07-01 10:04:58 +0200
  • 5ce2892940
    fix mypy Georg Heiler 2025-06-30 17:32:20 +0200
  • e37080220c feat: Introduce LayoutOptions to control layout postprocessing behaviour Christoph Auer 2025-06-30 16:45:57 +0200
  • 678eed2057 Merge from main Christoph Auer 2025-06-30 14:49:33 +0200
  • 92eb1517b6 Upgrade deps Christoph Auer 2025-06-30 14:43:42 +0200
  • bdfee4e2d0
    chore: Safer unloading of DPv4 backend (#1867) Christoph Auer 2025-06-30 14:41:21 +0200
  • 4cfb2cd0a9 Merge from main Christoph Auer 2025-06-30 14:39:17 +0200
  • ae39a9411a
    fix: Ensure that TesseractOcrModel does not crash in case OSD is not installed (#1866) Nikos Livathinos 2025-06-30 10:55:56 +0200
  • 531448cc7a Merge from main Christoph Auer 2025-06-30 10:48:55 +0200
  • 19b61d4fc6 fix: Safer unloading of DPv4 backend Christoph Auer 2025-06-30 10:47:03 +0200
  • 2b6ea0380f Merge branch 'main' into nli/fix_tesseract_ocr_model Nikos Livathinos 2025-06-30 08:19:28 +0200
  • 53cff14e83 fix: Ensure that TesseractOcrModel does not crash if tesseract OSD is not installed Nikos Livathinos 2025-06-27 18:25:09 +0200
  • bb99be6c24 chore: bump version to 2.39.0 [skip ci] v2.39.0 github-actions[bot] 2025-06-27 15:37:53 +0000
  • 0533da1923
    feat: leverage new list modeling, capture default markers (#1856) Panos Vagenas 2025-06-27 16:37:15 +0200
  • b8163d0b57 ensure availability of latest docling-core API Panos Vagenas 2025-06-27 15:37:55 +0200
  • 448690e561 repin docling-core Panos Vagenas 2025-06-27 15:07:00 +0200
  • aa472b17bd update backends to leverage new list modeling Panos Vagenas 2025-06-27 10:21:57 +0200
  • 0be26f4c52 chore: update docling-core & regenerate test data Panos Vagenas 2025-06-27 06:52:31 +0200
  • 6beec77788 update backends to leverage new list modeling remodel-lists-2 Panos Vagenas 2025-06-27 10:21:57 +0200
  • 23dc50ee8f chore: update docling-core & regenerate test data Panos Vagenas 2025-06-27 06:52:31 +0200
  • 2e9bf6862f
    Merge 08beb406d9 into e79e4f0ab6 krrome 2025-06-26 17:24:40 -0400
  • e79e4f0ab6
    fix(markdown): make parsing of rich table cells valid (#1821) Michael Honaker 2025-06-26 13:50:45 -0400
  • 6406defb1f Fix minor ground truth errors Michael Honaker 2025-06-26 11:36:00 -0400
  • bd042526a6
    Update base_pipeline.py jane-temcious 2025-06-26 19:50:13 +0530
  • 1fdfb2bc50
    Update base_pipeline.py jane-temcious 2025-06-26 06:49:07 +0530
  • 5ab8792697
    Create gpu_utils.py jane-temcious 2025-06-26 06:42:29 +0530
  • ee4781075a chore: bump version to 2.38.1 [skip ci] v2.38.1 github-actions[bot] 2025-06-25 16:27:46 +0000
  • bca8dd34b9 Fix merge issues Michael Honaker 2025-06-25 12:12:07 -0400
  • 1f47908bc7 Merge remote-tracking branch 'origin/main' into md_table_improvements Michael Honaker 2025-06-25 12:09:38 -0400
  • d337825b8e
    fix: updated granite vision model version for picture description (#1852) pranaymiri 2025-06-25 21:19:56 +0530
  • 8bb0d747f4
    Merge branch 'docling-project:main' into main pranaymiri 2025-06-25 20:07:35 +0530
  • 47e2adcd9a DCO Remediation Commit for Miriyala Pranay <miriyalapranay146@gmail.com> I, Miriyala Pranay <miriyalapranay146@gmail.com>, hereby add my Signed-off-by to this commit: 5de0d5034c Miriyala Pranay 2025-06-25 19:56:29 +0530
  • 7c5614a37a
    fix(markdown): fix single-formatted headings & list items (#1820) Panos Vagenas 2025-06-25 13:05:06 +0200
  • 41e8cae26b
    fix: fix response type of ollama (#1850) Michele Dolfi 2025-06-25 04:33:09 -0500
  • 050d47f8ba empty commit launching tests with latest main Michele Dolfi 2025-06-25 10:34:54 +0200
  • f6674016b9 fix response type of ollama Michele Dolfi 2025-06-25 09:44:52 +0200
  • 4002de1f92
    fix: Handle missing runs to avoid out of range exception (#1844) Allen N. 2025-06-24 22:55:27 -0700
  • 5de0d5034c updated granite model version Miriyala Pranay 2025-06-24 23:28:16 +0530
  • 57bb8750ad improve test case Panos Vagenas 2025-06-24 09:26:08 +0200
  • 1badf58e59 update lock Panos Vagenas 2025-06-23 16:46:47 +0200
  • 9368329973 add change and updated test data Panos Vagenas 2025-06-19 16:40:05 +0200
  • 39401f5157 fix(markdown): fix formatting & inline edge cases (show behavior before change) Panos Vagenas 2025-06-19 16:39:12 +0200
  • 6ef95c4a4e
    Fixes #1681 on upstream Allen Nikka 2025-06-23 18:31:47 -0700
  • 1dc63d0aa9 chore: bump version to 2.38.0 [skip ci] v2.38.0 github-actions[bot] 2025-06-23 18:14:24 +0000
  • f3ae3029b8
    docs: update readme and add ASR example (#1836) Peter W. J. Staar 2025-06-23 18:55:16 +0200
  • 638820d889 reformatting Peter Staar 2025-06-23 18:20:03 +0200
  • 3845e170f3 added link tp existing audio file Peter Staar 2025-06-23 18:17:10 +0200
  • 452094f6d8 added link tp existing audio file Peter Staar 2025-06-23 18:13:51 +0200
  • 3666263510 updated CI and mkdocs Peter Staar 2025-06-23 18:08:43 +0200
  • d1023b57e6 fix:missing elements in markdown lists Bruno Rigal 2025-06-23 14:31:48 +0000
  • 0fdbf72f98 Updated docs.index.md Peter Staar 2025-06-23 15:50:24 +0200
  • 687241e8a6 Updated README and added ASR example Peter Staar 2025-06-23 15:46:06 +0200
  • 9afc2c7673 added minimal_asr_pipeline Peter Staar 2025-06-23 15:44:14 +0200
  • 602eaf9682 updated the README Peter Staar 2025-06-23 15:38:11 +0200
  • 1557e7ce3e
    feat: Support audio input (#1763) Peter W. J. Staar 2025-06-23 14:47:26 +0200
  • b4053afe60 Install ffmpeg system dependency for ASR test Christoph Auer 2025-06-23 14:06:27 +0200
  • fccf4583ad Add missing audio file and test Christoph Auer 2025-06-23 13:26:16 +0200
  • 26addcf946 Support every format in NoOpBackend Christoph Auer 2025-06-23 13:19:25 +0200
  • d54cea02b9 Rename to NoOpBackend, add test for ASR pipeline Christoph Auer 2025-06-23 13:03:44 +0200
  • 01706beea4 file rename Christoph Auer 2025-06-23 11:00:25 +0200
  • b43aef2eb5 AudioBackend -> DummyBackend Christoph Auer 2025-06-23 09:55:59 +0200
  • caf18e634b Merge branch 'main' of github.com:DS4SD/docling into dev/add-asr-pipeline-v2 Christoph Auer 2025-06-23 09:08:58 +0200
  • 117add0396 fix/ran poetry run pre-commit run --all-files to format the file Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-14 15:35:50 -0400
  • 2feb4b0c28 fix/removed generate=True in test_backend_pptx.py in verify_export method to not conflict with main branch Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-13 20:46:08 -0400
  • 4ee7fd7747 DCO Remediation Commit for Benichou <fbenichou@deloitte.ca> Benichou 2025-06-20 17:00:43 -0400
  • 6cf9fd1008 fix/implementing the capture of pptx_image with the same method from docx backend by extracting the drawing blip Benichou 2025-04-08 11:33:52 -0400
  • eb7980af0b fix/adding a commit with a signature Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-04-08 01:00:12 -0400
  • d9f07040f3 DCO Remediation Commit for Benichou <fbenichou@deloitte.ca> Benichou 2025-06-20 16:53:05 -0400
  • 89dc98bd6f DCO Remediation Commit for Benichou <fbenichou@deloitte.ca> Benichou 2025-06-20 16:31:25 -0400
  • ed56086a65 fix/poetry_check Signed-off-by: Benichou <fbenichou@deloitte.ca> Benichou 2025-06-20 16:24:48 -0400
  • 4420c38936 fix/ran poetry run pre-commit run --all-files to format the file Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-14 15:35:50 -0400
  • 22bf211acf fix/removed generate=True in test_backend_pptx.py in verify_export method to not conflict with main branch Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-13 20:46:08 -0400
  • 2e3c4e10cb fix/adding the missing slide size argument in the handle pictures in the mspowerpoint_backend.py file and adding generate=True in the verify export method in the pytest for pptx to ensure the pytest passes appropriately Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-13 20:34:56 -0400
  • a35d9bb8b8 fix: run poetry pre-commit all files to black format changes Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-04-14 22:43:44 -0400
  • 82a9d27c96 fix/implementing the capture of pptx_image with the same method from docx backend by extracting the drawing blip Benichou 2025-04-08 11:33:52 -0400
  • dda339397b fix/adding a commit with a signature Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-04-08 01:00:12 -0400
  • b0553e8812 fix/implementing the capture of pptx_image with the same method from docx backend by extracting the drawing blip Benichou 2025-04-08 00:53:24 -0400
  • 95e49705e8 chore: update lock file (#1315) Michele Dolfi 2025-04-07 17:47:51 +0200
  • 46fa6e5eb0 fix(pptx): check if picture shape has an image attached (#1316) Maxim Lysak 2025-04-07 17:36:56 +0200
  • 78dab32819 feat(docx): add text formatting and hyperlink support (#630) Simon Jégou 2025-04-03 15:11:50 +0200
  • e652f134ee docs: add visual grounding example (#1270) Panos Vagenas 2025-04-02 14:03:19 +0200
  • 7af290e482 fix(docx): Improve text parsing (#1268) Rafael Teixeira de Lima 2025-04-02 12:56:44 +0200
  • 4c741b53fa fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) Guilhem VERMOREL 2025-03-31 10:53:49 +0200
  • 020f79a5ee fix/poetry_check Signed-off-by: Benichou <fbenichou@deloitte.ca> Benichou 2025-06-20 16:16:03 -0400
  • 28cdf76e93 Fix ground truth header changes Michael Honaker 2025-06-20 08:48:11 -0700
  • acfd1dab86 Update all test cases again (2) Christoph Auer 2025-06-20 17:35:56 +0200