Commit Graph

  • a3716b1961 refactoring minimal_vlm_pipeline Peter Staar 2025-05-14 13:57:32 +0200
  • 2efb7a7c06
    fix(settings): fix nested settings load via environment variables (#1551) Alex Sokolov 2025-05-14 14:42:10 +0300
  • 65f33de002 docs: add advanced chunking & serialization example Panos Vagenas 2025-05-14 12:58:36 +0200
  • 7c97b494ec added the VlmPredictionToken Peter Staar 2025-05-14 12:23:46 +0200
  • 3781022407 fix(settings): fix nested settings load via environment variables Alexander Sokolov 2025-05-08 14:35:04 +0300
  • 41e3dc898b add parent Michele Dolfi 2025-05-14 10:12:47 +0200
  • d5d42ed4ed make the example files location more generic Michele Dolfi 2025-05-14 09:51:00 +0200
  • 12dab0a1e8
    feat: support image/webp file type (#1415) Elwin 2025-05-14 15:47:28 +0800
  • 8d9fbff1a7 feat: add textbox content extraction in msword_backend Andrew 2025-05-14 14:59:38 +0800
  • 6a9ef10d0c rename test file Michele Dolfi 2025-05-14 07:31:05 +0100
  • 482483e74a Merge remote-tracking branch 'origin/main' into support-webp-filetype Michele Dolfi 2025-05-14 07:17:02 +0100
  • f159075b67 pixtral 12b runs via MLX and native transformers Peter Staar 2025-05-14 07:39:20 +0200
  • 054e01d8b3 added the formulate_prompt Peter Staar 2025-05-14 06:26:16 +0200
  • 4c0bc61e54 refactoring the download_model Peter Staar 2025-05-14 05:31:54 +0200
  • 2077e51033 fix/removed generate=True in test_backend_pptx.py in verify_export method to not conflict with main branch Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-13 20:46:08 -0400
  • 4e8bf2c4d3 fix/adding the missing slide size argument in the handle pictures in the mspowerpoint_backend.py file and adding generate=True in the verify export method in the pytest for pptx to ensure the pytest passes appropriately Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-05-13 20:34:56 -0400
  • d7922ab31d feat: Picture description using context with surrounding text Rafael T. C. Soares 2025-04-24 18:27:16 -0500
  • 3407955a47 all working, now serious refacgtoring necessary Peter Staar 2025-05-13 18:23:55 +0200
  • a8a119c93b
    Update test_backend_msexcel_xlsm.py ShiroYasha18 2025-05-13 21:31:15 +0530
  • 23238c241f chore: bump version to 2.31.2 [skip ci] v2.31.2 github-actions[bot] 2025-05-13 10:09:19 +0000
  • 498fc79392 feat: add textbox content extraction in msword_backend Andrew 2025-05-13 17:11:43 +0800
  • b09fd45a46 feat: add textbox content extraction in msword_backend Andrew 2025-05-07 12:25:43 +0800
  • 4046d0b2f3
    fix: AsciiDoc header identification (#1562) (#1563) Marco Fargetta 2025-05-13 11:17:26 +0200
  • e9ad875f9d
    Merge branch 'docling-project:main' into fix/fix-issue-with-detecting-docx-files MoheyElDin Badr 2025-05-13 12:15:07 +0300
  • 069378aefb
    Update docling_parse_v4_backend.py ShiroYasha18 2025-05-13 14:31:33 +0530
  • b5b1d5c10c
    Merge branch 'docling-project:main' into main ShiroYasha18 2025-05-13 14:28:58 +0530
  • 8baa85a49d
    fix: restrict click version and update lock file (#1582) Michele Dolfi 2025-05-13 10:40:08 +0200
  • 96862bd326 refactoring the VLM part Peter Staar 2025-05-13 10:01:37 +0200
  • 57511220ed Update test GT Christoph Auer 2025-05-13 09:49:40 +0200
  • 9b41b10608 fix click dependency and update lock file Michele Dolfi 2025-05-13 07:26:08 +0100
  • ee01e3cff0 Merge branch 'main' into dev/add-other-vlm-models Peter Staar 2025-05-13 06:08:26 +0200
  • 7fbe021359 working on vlm's Peter Staar 2025-05-13 06:07:11 +0200
  • 77eb21b235 got microsoft/Phi-4-multimodal-instruct to work Peter Staar 2025-05-12 13:37:03 +0200
  • 68747e3cad fixed the transformers Peter Staar 2025-05-12 13:08:33 +0200
  • 0d0fa6cbe3 chore: bump version to 2.31.1 [skip ci] v2.31.1 github-actions[bot] 2025-05-12 09:44:26 +0000
  • 127e38646f
    fix: add smoldocling in download utils (#1577) Michele Dolfi 2025-05-12 10:48:07 +0200
  • 67de4f263e add smoldocling in download utils Michele Dolfi 2025-05-12 10:01:29 +0200
  • bd2d01f0ac Merge branch 'main' into dev/add-other-vlm-models Peter Staar 2025-05-12 08:52:52 +0200
  • 76501331d2 need to fix ruff linter dev/add-asr-pipeline Peter Staar 2025-05-12 07:34:24 +0200
  • 32ad65cb9f work in progress: slowly adding ASR pipeline and its derivatives Peter Staar 2025-05-12 07:33:38 +0200
  • 844babb390
    docs: update links in data_prep_kit (#1559) Oleg Lavrovsky 2025-05-11 20:38:25 +0200
  • 18e1ec4df2 feat: adding new vlm-models support Peter Staar 2025-05-11 09:30:10 +0200
  • 880de0379b
    Added example GUI Oleg Lavrovsky 2025-05-10 11:44:58 +0200
  • 7cd1d59e0e fix: AsciiDoc header identification (#1562) Marco Fargetta 2025-05-09 19:25:04 +0200
  • c7966e80f0
    Merge branch 'docling-project:main' into main ShiroYasha18 2025-05-09 21:52:48 +0530
  • 4378be1480 undo for test folder Matthias Günter 2025-05-09 16:54:04 +0200
  • a194392d1a Merge branch 'main' of https://github.com/ue71603/docling Matthias Günter 2025-05-09 16:39:50 +0200
  • 46d6cf078e point towards the real places of the test data Matthias Günter 2025-05-09 16:39:31 +0200
  • bc28b21c6a
    Update data_prep_kit.md Oleg Lavrovsky 2025-05-09 16:00:17 +0200
  • 776e7ecf9a
    fix(HTML): handle row spans in header rows (#1536) Cesar Berrospi Ramis 2025-05-09 15:14:32 +0200
  • 6e956dc551 Merge branch 'main' into nli/layoutmodel_improvements nli/layoutmodel_improvements Nikos Livathinos 2025-05-09 14:47:44 +0200
  • 3220a592e7
    docs: add serialization docs, update chunking docs (#1556) Panos Vagenas 2025-05-08 21:43:01 +0200
  • 1295c85985 update notebook to improve MD table rendering Panos Vagenas 2025-05-08 20:38:19 +0200
  • 3031302208 docs: add serializers docs, update chunking docs Panos Vagenas 2025-05-08 20:09:27 +0200
  • 8186bdcd4c not do amateur hour stuff Vinay Damodaran 2025-05-07 21:42:16 -0700
  • 45c5a9445a Use yield from correctly? Vinay Damodaran 2025-05-07 20:50:52 -0700
  • e82bcaae61 Provide the option to make remote services call concurrent Vinay Damodaran 2025-05-07 20:48:05 -0700
  • 54d2422ad3
    Merge branch 'docling-project:main' into main ShiroYasha18 2025-05-08 04:18:18 +0530
  • f6c601da03
    Update docling_parse_v4_backend.py ShiroYasha18 2025-05-08 03:59:10 +0530
  • dd6cb20562 Add other types Mohey El-Din Badr 2025-05-07 12:27:29 +0300
  • ec6bd87ab9 Merge branch 'main' into fix/fix-issue-with-detecting-docx-files Mohey El-Din Badr 2025-05-07 12:02:47 +0300
  • fc5a9492a3 feat: add textbox content extraction in msword_backend Andrew 2025-05-06 17:35:55 +0800
  • 80832d9f39 feat: add textbox content extraction in msword_backend Andrew 2025-05-06 17:35:55 +0800
  • 6395495824 fix(HTML): handle row headers like in pivot tables Cesar Berrospi Ramis 2025-05-06 17:32:30 +0200
  • 006ebc32e4 Add confidence test Christoph Auer 2025-05-06 17:35:46 +0200
  • 2a6537289b Introduce mean_score and low_score, consistent aggregate computations Christoph Auer 2025-05-06 16:40:02 +0200
  • f9496e4a91 Move grade to page Christoph Auer 2025-05-06 15:33:25 +0200
  • a8a8b8e0f9 Fix garbage regex Christoph Auer 2025-05-06 15:30:16 +0200
  • 95d2f5fd92 Add grading scheme Christoph Auer 2025-05-06 14:54:47 +0200
  • e0b77e3173 chore(HTML): log the stacktrace of errors Cesar Berrospi Ramis 2025-05-06 10:27:05 +0200
  • f1658edbad
    fix: mime error in document streams (#1523) DavidLee 2025-05-06 15:30:46 +0800
  • bcb29caf96
    Update document.py MoheyElDin Badr 2025-05-06 09:40:13 +0300
  • 3ca4fce650
    Added Docling Enrichments for Tables & Figures. Nikhil Khandelwal 2025-05-06 11:31:11 +0530
  • 7a50f04c9c
    Update document.py DavidLee 2025-05-06 09:51:56 +0800
  • 7955903e9b
    Merge branch 'main' into main ka-weihe 2025-05-04 22:26:23 +0200
  • 0b0c6b985b
    Create example_08.html ka-weihe 2025-05-04 22:24:51 +0200
  • 7401685e4f updated code for xlsm support ShiroYasha18 2025-05-04 01:43:08 +0530
  • 2d28b6c296 updated support for xlsm ShiroYasha18 2025-05-04 01:30:59 +0530
  • e6a070234f code for xlsm support ShiroYasha18 2025-05-04 01:07:27 +0530
  • 7c705739f9
    fix: usage of hashlib for FIPS (#1512) Michele Dolfi 2025-05-02 15:03:29 +0200
  • 99d8572f6d chore: propagate docling-core fixes propagate-core-fixes-20250502 Panos Vagenas 2025-05-02 14:47:21 +0200
  • f7326b5e14 fix usage of hashlib for FIPS Michele Dolfi 2025-05-02 14:25:18 +0200
  • de56523974
    chore: format JSON test files to enable comparison (#1511) Panos Vagenas 2025-05-02 11:52:18 +0300
  • b250d48afb chore: format JSON test files Panos Vagenas 2025-05-02 09:49:12 +0200
  • 46700e9f29
    Update picture_description_vlm_model.py jane-temcious 2025-05-01 09:37:27 +0530
  • b147331f2a
    chore: restore typing hint for self.script_readers (#1500) Ihar Hrachyshka 2025-04-30 14:33:27 -0400
  • 669be8e9f6 chore: restore typing hint for self.script_readers Ihar Hrachyshka 2025-04-30 12:33:58 -0400
  • 4ab7e9ddfb
    fix: Guard against attribute errors in TesseractOcrModel __del__ (#1494) Ben Browning 2025-04-30 11:51:33 -0400
  • f35fcab632 fix: Guard against attribute errors in TesseractOcrModel __del__ Ben Browning 2025-04-29 13:41:30 -0400
  • cc453961a9
    fix: enable cuda_use_flash_attention2 for PictureDescriptionVlmModel (#1496) Zach Cox 2025-04-30 02:02:52 -0400
  • 0961cda5fb fix: enable use_cuda_flash_attention2 for PictureDescriptionVlmModel Zach Cox 2025-04-29 19:33:58 -0400
  • 50c108c6d3 add test file Manuel030 2025-04-29 16:18:09 +0200
  • 2acce04305 adding doctr ocr to pipeline omar 2025-04-29 12:25:31 +0000
  • 976e92e289
    fix: updated the time-recorder label for reading order (#1490) Peter W. J. Staar 2025-04-29 13:02:53 +0200
  • eee4afe8cf reformatted code Peter Staar 2025-04-29 10:38:58 +0200
  • 2f66248605 fix: updated the time-recorder label for reading order Peter Staar 2025-04-29 10:21:29 +0200
  • 387dd659c1 fix: find paragraphs in elements with images in docx Manuel030 2025-04-28 13:46:08 +0200
  • d8959c6b19
    chore: update dependencies in lock file (#1458) Michele Dolfi 2025-04-28 08:52:46 +0200
  • a097ccd8d5
    chore: typo fix (#1465) nkh0472 2025-04-28 14:52:09 +0800
  • 3afbe6c969
    docs: update supported formats guide (#1463) Emmanuel Ferdman 2025-04-28 09:51:54 +0300