Commit Graph

  • 8895fb546f update test results Michele Dolfi 2025-04-11 16:35:15 +0200
  • 8b571366e3
    Update repository URL in CITATION.cff Simon Leiß 2025-04-11 16:27:13 +0200
  • ae0bb86ac7 relock with fixed html export in docling-core Michele Dolfi 2025-04-11 15:53:13 +0200
  • 0063f15dcc add pin for new docling-core with html split argument Michele Dolfi 2025-04-11 13:10:18 +0200
  • e803967831 Merge remote-tracking branch 'origin/main' into dev/add-split-page-html-export Michele Dolfi 2025-04-11 13:06:00 +0200
  • 415b877984
    fix(docx): declare image_data variable when handling pictures (#1359) Cesar Berrospi Ramis 2025-04-11 13:04:00 +0200
  • 250399948d
    fix: Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) Rowan Skewes 2025-04-11 19:14:05 +1000
  • 4b0e3ef104 fix(docx): declare image_data variable when handling pictures Cesar Berrospi Ramis 2025-04-11 10:58:15 +0200
  • eef2bdea77
    feat(xlsx): create a page for each worksheet in XLSX backend (#1332) Cesar Berrospi Ramis 2025-04-11 10:29:53 +0200
  • f33fe7dbf0 feat: add a new PictureDescription Model to support llama-stack API Rafael T. C. Soares 2025-04-09 14:46:00 -0500
  • c605edd8e9
    feat: OllamaVlmModel for Granite Vision 3.2 (#1337) Gabe Goodhart 2025-04-10 10:03:04 -0600
  • b6352afb9e add examples to docs Michele Dolfi 2025-04-10 16:20:24 +0200
  • 48d5405db1 bug: auto-recognize .xlsx files Tim Kellogg 2025-04-09 06:56:03 -0400
  • 6b696b504a
    fix: Properly address page in pipeline _assemble_document when page_range is provided (#1334) Joan Fabrégat 2025-04-10 16:11:28 +0200
  • 224dacff75 disable example from CI Michele Dolfi 2025-04-10 14:24:11 +0200
  • 72ab8e1821 chore: bump version to 2.29.0 [skip ci] v2.29.0 github-actions[bot] 2025-04-10 12:24:09 +0000
  • 8a75615b9b require flag for remote services Michele Dolfi 2025-04-10 14:20:33 +0200
  • 414a6b7246 add example Michele Dolfi 2025-04-10 14:10:58 +0200
  • 8891c66536 updated the cli to output html in split-page mode Peter Staar 2025-04-10 13:58:59 +0200
  • f77c8cf96c rename and refactor Michele Dolfi 2025-04-10 11:33:00 +0200
  • 0f438b3a76 generalize input args for other API providers Michele Dolfi 2025-04-10 10:27:41 +0200
  • 66253e8a4b remove model from download enum Michele Dolfi 2025-04-10 10:00:37 +0200
  • 6ebe1356f0
    docs: Add Notes for Intel macOS Juil Park 2025-04-10 16:59:52 +0900
  • aa6d1bae7c fix: Implement PictureDescriptionApiOptions.picture_area_threshold Rowan Skewes 2025-03-27 17:08:55 +1100
  • f14c1b4f05 fix: Linting, formatting, and bug fixes Gabe Goodhart 2025-04-09 11:54:22 -0600
  • d001097376 docling(xlsx): add bounding boxes and page size information in cell units Cesar Berrospi Ramis 2025-04-09 17:40:40 +0200
  • e31bb03bff fix: incorrect code suggestion for setting device type Ihar Hrachyshka 2025-04-09 12:02:56 -0400
  • 81f38e960a docs(xlsx): add docstrings to XLSX backend module. Cesar Berrospi Ramis 2025-04-08 14:45:32 +0200
  • 06a0ae8294 feat(xlsx): create a page for each worksheet in XLSX backend Cesar Berrospi Ramis 2025-04-08 14:04:11 +0200
  • e813f02943 sytle(xlsx): enforce type hints in XLSX backend Cesar Berrospi Ramis 2025-04-08 11:06:38 +0200
  • 7b7a3a2004 refactor: Refactor from Ollama SDK to generic OpenAI API Gabe Goodhart 2025-04-09 09:23:28 -0600
  • ad1541e8cf refactor: Move OpenAI API call logic into utils.utils Gabe Goodhart 2025-04-09 08:02:01 -0600
  • 8ef0b897c8 Revert "build: Add ollama sdk dependency" Gabe Goodhart 2025-04-09 07:14:55 -0600
  • 72dd815195 feat: Connect "granite_vision_ollama" pipeline option to CLI Gabe Goodhart 2025-04-08 13:29:50 -0600
  • 219d8db626 feat: Full implementation of OllamaVlmModel Gabe Goodhart 2025-04-08 13:29:26 -0600
  • 5902d9e1c1 feat: Add option plumbing for OllamaVlmOptions in pipeline_options Gabe Goodhart 2025-04-08 13:29:03 -0600
  • 17f381da4f build: Add ollama sdk dependency Gabe Goodhart 2025-04-08 13:05:00 -0600
  • 355d8dc7a6
    chore: Logo parameter in docling CLI, prints cute ascii logo (#1294) Maxim Lysak 2025-04-09 05:29:48 +0200
  • 2632feeb75
    fix for the (dumb) MyPy type checker Joan Fabrégat 2025-04-08 18:17:15 +0200
  • 253cfab15e fix/implementing the capture of pptx_image with the same method from docx backend by extracting the drawing blip Benichou 2025-04-08 11:33:52 -0400
  • 02f77bbabd fix/adding a commit with a signature Signed-off-by: Franck Benichou franck.benichou@sciencespo.fr Benichou 2025-04-08 01:00:12 -0400
  • 5b6086a489
    Fixes #1333 Joan Fabrégat 2025-04-08 17:04:38 +0200
  • 14e9c0ce9a
    fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) Rafael Teixeira de Lima 2025-04-08 17:11:37 +0200
  • 643e4918c3 Fix test file Rafael Teixeira de Lima 2025-04-08 16:27:18 +0200
  • 9557431b94 Adding new latex symbols, simplifying how equations are added to text Rafael Teixeira de Lima 2025-04-03 17:57:30 +0200
  • 4949471e50 Log warning message instead of print Rafael Teixeira de Lima 2025-04-08 15:41:16 +0200
  • e7fc1a40ed Identify headers through inhenrited style Rafael Teixeira de Lima 2025-04-04 14:46:43 +0200
  • ae2e0832cd Adding new latex symbols, simplifying how equations are added to text Rafael Teixeira de Lima 2025-04-03 17:57:30 +0200
  • a29d4f7429 feat: handle <code> tags as code blocks (#1320) Fernando Santos 2025-04-08 05:32:06 -0300
  • 36006d5829 docs: add plugins docs (#1319) Michele Dolfi 2025-04-08 09:44:37 +0200
  • 951127605d chore: update lock file (#1315) Michele Dolfi 2025-04-07 17:47:51 +0200
  • b85e0196f6 fix(pptx): check if picture shape has an image attached (#1316) Maxim Lysak 2025-04-07 17:36:56 +0200
  • fc306a7817 feat(docx): add text formatting and hyperlink support (#630) Simon Jégou 2025-04-03 15:11:50 +0200
  • 07f0846d42 docs: add visual grounding example (#1270) Panos Vagenas 2025-04-02 14:03:19 +0200
  • 870e33235d fix(docx): Improve text parsing (#1268) Rafael Teixeira de Lima 2025-04-02 12:56:44 +0200
  • fb36311e3a fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) Guilhem VERMOREL 2025-03-31 10:53:49 +0200
  • 2ad8da9be9 Merge branch 'rtdl/new_latex_symbols' of github.com:DS4SD/docling into rtdl/new_latex_symbols Rafael Teixeira de Lima 2025-04-08 16:19:25 +0200
  • ee30d3e7f8 Log warning message instead of print Rafael Teixeira de Lima 2025-04-08 15:41:16 +0200
  • 851baf1090 Identify headers through inhenrited style Rafael Teixeira de Lima 2025-04-04 14:46:43 +0200
  • 207cd78a26 Adding new latex symbols, simplifying how equations are added to text Rafael Teixeira de Lima 2025-04-03 17:57:30 +0200
  • 556b949b18 Log warning message instead of print Rafael Teixeira de Lima 2025-04-08 15:41:16 +0200
  • 0499cd1c1e
    feat: handle <code> tags as code blocks (#1320) Fernando Santos 2025-04-08 05:32:06 -0300
  • 2e99e5a54f
    docs: add plugins docs (#1319) Michele Dolfi 2025-04-08 09:44:37 +0200
  • b7c3f2e984 fix/implementing the capture of pptx_image with the same method from docx backend by extracting the drawing blip Benichou 2025-04-08 00:53:24 -0400
  • 09794641e6 update test results Michele Dolfi 2025-04-08 06:28:23 +0200
  • dcbdd74b27 pin dev branch of docling-parse Michele Dolfi 2025-04-08 06:19:44 +0200
  • b2fdb13f43 add plugin docs Michele Dolfi 2025-04-07 22:18:18 +0200
  • 064c3a0e34 handle <code> tags as code blocks FernandoSSI 2025-04-07 16:28:16 -0300
  • 16c90b64f5 Add parse quality rules, use 5% percentile for overall and parse scores Christoph Auer 2025-04-07 18:07:25 +0200
  • 61de30966f
    chore: update lock file (#1315) Michele Dolfi 2025-04-07 17:47:51 +0200
  • dc3bf9ceac
    fix(pptx): check if picture shape has an image attached (#1316) Maxim Lysak 2025-04-07 17:36:56 +0200
  • bdcb25b5f5 Check if picture shape has an image attached in pptx backend Maksym Lysak 2025-04-07 17:03:22 +0200
  • 1ae6edc9a2 chore: update lock Michele Dolfi 2025-04-07 17:00:28 +0200
  • 83e0fa2f5e Add OCR confidence and parse confidence (stub) Christoph Auer 2025-04-07 14:49:53 +0200
  • c907af0928 Establish confidence field, propagate layout confidence through Christoph Auer 2025-04-07 14:34:13 +0200
  • 5e9f76dca5
    Merge branch 'main' into dev/docling-cli-logo Maxim Lysak 2025-04-04 17:50:25 +0200
  • 4bea04dc75 Identify headers through inhenrited style Rafael Teixeira de Lima 2025-04-04 14:46:43 +0200
  • 3f841e540a
    Merge 5e12d0795a into bfcab3d677 benichou 2025-04-04 01:54:20 +0530
  • 32b03b65f4
    Merge branch 'main' into rtdl/new_latex_symbols Rafael Teixeira de Lima 2025-04-03 18:00:32 +0200
  • 64a7888092 Adding new latex symbols, simplifying how equations are added to text Rafael Teixeira de Lima 2025-04-03 17:57:30 +0200
  • 2dce9fa5b4 logo parameter in docling cli, prints cute ascii logo Maksym Lysak 2025-04-03 16:24:33 +0200
  • bfcab3d677
    feat(docx): add text formatting and hyperlink support (#630) Simon Jégou 2025-04-03 15:11:50 +0200
  • a1cb0dd344 fix minor bugs, mark helper methods internal Panos Vagenas 2025-04-03 14:21:34 +0200
  • 88a9756861 Detecting table orientation dev/table-orientation Maksym Lysak 2025-04-03 11:10:57 +0200
  • c4f9916fbb Fix add_list_item SimJeg 2025-04-02 17:48:03 +0200
  • da25453155 Address feedback SimJeg 2025-04-02 17:20:52 +0200
  • f40b21e94c Run precommit SimJeg 2025-04-02 16:14:10 +0200
  • cd4b214f05 Merge branch 'main' into docx-markdown-formatting SimJeg 2025-04-02 14:56:30 +0200
  • 71148eb381
    docs: add visual grounding example (#1270) Panos Vagenas 2025-04-02 14:03:19 +0200
  • 1028df66e4 Merge remote-tracking branch 'upstream/main' into docx-markdown-formatting SimJeg 2025-04-02 13:47:47 +0200
  • 21ee884c54
    Merge branch 'main' into show-visual-grounding Panos Vagenas 2025-04-02 13:29:30 +0200
  • d2d68747f9
    fix(docx): Improve text parsing (#1268) Rafael Teixeira de Lima 2025-04-02 12:56:44 +0200
  • c0f769cdd0
    Merge branch 'main' into rtdl/improve_text_parsing Rafael Teixeira de Lima 2025-04-02 12:05:03 +0200
  • c979eaab1a Remove trailing space Rafael Teixeira de Lima 2025-04-02 12:02:05 +0200
  • 331c6ab466 Fix trailing space Rafael Teixeira de Lima 2025-04-02 11:29:14 +0200
  • e535209c75 Flexibilize heading detection Rafael Teixeira de Lima 2025-04-02 10:32:36 +0200
  • d5431577f0 fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) Guilhem VERMOREL 2025-03-31 10:53:49 +0200
  • 76982a5b15 Improve text parsing Rafael Teixeira de Lima 2025-03-31 11:41:06 +0200
  • eb4d17bba5 chore: bump version to 2.28.4 [skip ci] github-actions[bot] 2025-03-29 11:56:42 +0000
  • 895cedb9ab Use inline_fmt everywhere SimJeg 2025-04-01 11:52:20 +0200