Commit Graph

  • 844babb390 docs: update links in data_prep_kit (#1559) Oleg Lavrovsky 2025-05-11 20:38:25 +02:00
  • 776e7ecf9a fix(HTML): handle row spans in header rows (#1536) Cesar Berrospi Ramis 2025-05-09 15:14:32 +02:00
  • 6e956dc551 Merge branch 'main' into nli/layoutmodel_improvements nli/layoutmodel_improvements Nikos Livathinos 2025-05-09 14:47:44 +02:00
  • 3220a592e7 docs: add serialization docs, update chunking docs (#1556) Panos Vagenas 2025-05-08 21:43:01 +02:00
  • f1658edbad fix: mime error in document streams (#1523) DavidLee 2025-05-06 15:30:46 +08:00
  • 7c705739f9 fix: usage of hashlib for FIPS (#1512) Michele Dolfi 2025-05-02 15:03:29 +02:00
  • 99d8572f6d chore: propagate docling-core fixes propagate-core-fixes-20250502 Panos Vagenas 2025-05-02 14:47:21 +02:00
  • de56523974 chore: format JSON test files to enable comparison (#1511) Panos Vagenas 2025-05-02 11:52:18 +03:00
  • b147331f2a chore: restore typing hint for self.script_readers (#1500) Ihar Hrachyshka 2025-04-30 14:33:27 -04:00
  • 4ab7e9ddfb fix: Guard against attribute errors in TesseractOcrModel __del__ (#1494) Ben Browning 2025-04-30 11:51:33 -04:00
  • cc453961a9 fix: enable cuda_use_flash_attention2 for PictureDescriptionVlmModel (#1496) Zach Cox 2025-04-30 02:02:52 -04:00
  • 976e92e289 fix: updated the time-recorder label for reading order (#1490) Peter W. J. Staar 2025-04-29 13:02:53 +02:00
  • d8959c6b19 chore: update dependencies in lock file (#1458) Michele Dolfi 2025-04-28 08:52:46 +02:00
  • a097ccd8d5 chore: typo fix (#1465) nkh0472 2025-04-28 14:52:09 +08:00
  • 3afbe6c969 docs: update supported formats guide (#1463) Emmanuel Ferdman 2025-04-28 09:51:54 +03:00
  • 94d66a0765 fix: Incorrect scaling of TableModel bboxes when do_cell_matching is False (#1459) Maxim Lysak 2025-04-25 12:34:12 +02:00
  • c67133dde4 chore: bump version to 2.31.0 [skip ci] v2.31.0 github-actions[bot] 2025-04-25 08:28:25 +00:00
  • a2fbbba9f7 feat: add tutorial using Milvus and Docling for RAG pipeline (#1449) Ryan Lin 2025-04-25 03:12:35 -04:00
  • a553a1e5bf Merge branch 'main' into nli/layoutmodel_improvements Nikos Livathinos 2025-04-24 10:03:05 +02:00
  • 976431ed7f chore: update locked deps (#1442) Michele Dolfi 2025-04-23 14:59:31 +02:00
  • ed20124544 fix(html): handle address, details, and summary tags (#1436) Cesar Berrospi Ramis 2025-04-23 09:30:59 +02:00
  • c2470ed216 docs: Fix wrong output format in example code (#1427) nkh0472 2025-04-22 18:32:55 +08:00
  • 64918a81ac docs: Add OpenSSF Best Practices badge (#1430) Michele Dolfi 2025-04-22 11:23:28 +02:00
  • 32710d5fac test: Allow pypdfium2 5.x versions cau/test-pypdfium2-beta Christoph Auer 2025-04-22 09:06:25 +02:00
  • 995b3b0ab1 docs: Typo fixes in docling_document.md (#1400) Ben Cox 2025-04-22 07:49:08 +01:00
  • 8012a3e4d6 fix: Treat overflowing -v flags as DEBUG (#1419) Eugene 2025-04-19 13:02:41 +04:00
  • 88948b0bba docs: Updated the [Usage] link in architecture.md (#1416) Leandro Rosas 2025-04-19 09:20:52 +01:00
  • 4ce338f455 fix: Adjust the LayoutModel default paths for the docling-layout-heron Nikos Livathinos 2025-04-15 23:29:01 +02:00
  • fa7fc9e63d fix(codecov): fix codecov argument and yaml file (#1399) Cesar Berrospi Ramis 2025-04-15 18:12:57 +02:00
  • e5f8bb086d Merge branch 'main' into nli/layoutmodel_improvements Nikos Livathinos 2025-04-15 16:08:12 +02:00
  • 51463e3c1f feat: Refactor the LayoutModel to use docling-layout-heron. Pinpoint docling-ibm-models to the branch of new layout model Nikos Livathinos 2025-04-15 16:04:55 +02:00
  • 0782086009 Merge branch 'main' into nli/layoutmodel_improvements Nikos Livathinos 2025-04-15 13:24:09 +02:00
  • 550b1ca2f8 chore: propagate docling-core fix (#1389) Panos Vagenas 2025-04-15 10:51:47 +02:00
  • a7dd59c5cb docs(ocr): Add docs entry for OnnxTR OCR plugin (#1382) Felix Dittrich 2025-04-15 09:46:59 +02:00
  • 06227e9970 ci: sign pypi packages (#1392) Michele Dolfi 2025-04-15 08:59:16 +02:00
  • 5458a88464 ci: add coverage and ruff (#1383) Michele Dolfi 2025-04-14 18:01:26 +02:00
  • 293c28ca7c docs(security): more statements about secure development (#1381) Michele Dolfi 2025-04-14 13:53:26 +02:00
  • 01fbfd5652 docs: Add testing in the docs (#1379) Michele Dolfi 2025-04-14 12:31:48 +02:00
  • d9c3999175 chore: update lock file (#1378) Michele Dolfi 2025-04-14 10:38:10 +02:00
  • a026b4e84b docs: Add Notes for Installing in Intel macOS (#1377) Juil Park 2025-04-14 17:21:13 +09:00
  • c391adb5f0 chore: bump version to 2.30.0 [skip ci] v2.30.0 github-actions[bot] 2025-04-14 08:20:31 +00:00
  • 7e40ad3261 fix(deps): widen typer upper bound (#1375) Michele Dolfi 2025-04-14 09:23:39 +02:00
  • c0ba88edf1 feat(cli): add option for html with split-page mode (#1355) Peter W. J. Staar 2025-04-14 08:41:50 +02:00
  • 0de70e7991 fix: auto-recognize .xlsx, .docx and .pptx files (#1340) Tim Kellogg 2025-04-14 01:45:13 -04:00
  • b295da4bfe chore: Update repository URL in CITATION.cff (#1363) Simon Leiß 2025-04-14 06:57:04 +02:00
  • 415b877984 fix(docx): declare image_data variable when handling pictures (#1359) Cesar Berrospi Ramis 2025-04-11 13:04:00 +02:00
  • 250399948d fix: Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) Rowan Skewes 2025-04-11 19:14:05 +10:00
  • eef2bdea77 feat(xlsx): create a page for each worksheet in XLSX backend (#1332) Cesar Berrospi Ramis 2025-04-11 10:29:53 +02:00
  • c605edd8e9 feat: OllamaVlmModel for Granite Vision 3.2 (#1337) Gabe Goodhart 2025-04-10 10:03:04 -06:00
  • 6b696b504a fix: Properly address page in pipeline _assemble_document when page_range is provided (#1334) Joan Fabrégat 2025-04-10 16:11:28 +02:00
  • 72ab8e1821 chore: bump version to 2.29.0 [skip ci] v2.29.0 github-actions[bot] 2025-04-10 12:24:09 +00:00
  • 355d8dc7a6 chore: Logo parameter in docling CLI, prints cute ascii logo (#1294) Maxim Lysak 2025-04-09 05:29:48 +02:00
  • 14e9c0ce9a fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) Rafael Teixeira de Lima 2025-04-08 17:11:37 +02:00
  • 0499cd1c1e feat: handle <code> tags as code blocks (#1320) Fernando Santos 2025-04-08 05:32:06 -03:00
  • 2e99e5a54f docs: add plugins docs (#1319) Michele Dolfi 2025-04-08 09:44:37 +02:00
  • 61de30966f chore: update lock file (#1315) Michele Dolfi 2025-04-07 17:47:51 +02:00
  • dc3bf9ceac fix(pptx): check if picture shape has an image attached (#1316) Maxim Lysak 2025-04-07 17:36:56 +02:00
  • bfcab3d677 feat(docx): add text formatting and hyperlink support (#630) Simon Jégou 2025-04-03 15:11:50 +02:00
  • 88a9756861 Detecting table orientation dev/table-orientation Maksym Lysak 2025-04-03 11:10:57 +02:00
  • 71148eb381 docs: add visual grounding example (#1270) Panos Vagenas 2025-04-02 14:03:19 +02:00
  • d2d68747f9 fix(docx): Improve text parsing (#1268) Rafael Teixeira de Lima 2025-04-02 12:56:44 +02:00
  • b3d111a3cd fix: Tesseract OCR CLI can't process images composed with numbers only (#1201) Guilhem VERMOREL 2025-03-31 10:53:49 +02:00
  • 44f2b081ec chore: bump version to 2.28.4 [skip ci] v2.28.4 github-actions[bot] 2025-03-29 11:56:42 +00:00
  • 7afad7e52d fix: Fixes tables when using OCR (#1261) Maxim Lysak 2025-03-29 10:06:00 +01:00
  • 124f921077 chore: bump version to 2.28.3 [skip ci] v2.28.3 github-actions[bot] 2025-03-28 18:30:03 +00:00
  • 8bd71e8e33 fix: Word-level pdf cells for tables (#1238) Maxim Lysak 2025-03-28 16:34:48 +01:00
  • 82694b2136 chore: bump version to 2.28.2 [skip ci] v2.28.2 github-actions[bot] 2025-03-26 16:52:06 +00:00
  • 9210812bfa fix: improve HTML layer detection, various MD fixes (#1241) Panos Vagenas 2025-03-26 16:07:14 +01:00
  • 85c4df887b fix(html): fix HTML parsed heading level (#1244) Panos Vagenas 2025-03-26 10:30:23 +01:00
  • 9eb1686f93 chore: bump version to 2.28.1 [skip ci] v2.28.1 github-actions[bot] 2025-03-25 18:20:23 +00:00
  • 38b7108a22 chore: update locked deps (#1239) Panos Vagenas 2025-03-25 15:48:02 +01:00
  • f1f7df49e3 Update test-cases cau/test-dp-word-lines Christoph Auer 2025-03-25 13:49:08 +01:00
  • 825b226fab fix(converter): Cache same pipeline class with different options (#1152) mislavmartinic 2025-03-26 00:18:44 +13:00
  • 6df8827231 fix(debug): Missing translation of bbox to to_bounding_box (#1220) Hoang-Long Do 2025-03-25 18:18:10 +07:00
  • f739d0e4c5 fix(docx): identifying numbered headers (#1231) Rafael Teixeira de Lima 2025-03-25 11:41:02 +01:00
  • 0974ba4e1c docs(examples): batch conversion doc raises_on_error (#1147) Clément Doumouro 2025-03-25 11:14:39 +01:00
  • 8ebb0bf1a0 chore: properly clean up apt temporary files in Dockerfile (#1223) Peter Dave Hello 2025-03-25 18:10:09 +08:00
  • 7df157204b chore: bump version to 2.28.0 [skip ci] v2.28.0 github-actions[bot] 2025-03-19 15:18:10 +00:00
  • 1c26769785 feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199) Maxim Lysak 2025-03-19 15:38:54 +01:00
  • b454aa1551 feat: Add PPTX notes slides (#474) Maciej Wieczorek 2025-03-19 14:52:09 +01:00
  • f5adfb9724 fix: Determine correct page size in DoclingParseV4Backend (#1196) Christoph Auer 2025-03-19 11:05:42 +01:00
  • d5f7798763 test(html): fix regression test after docling-core update (#1197) Cesar Berrospi Ramis 2025-03-19 11:03:46 +01:00
  • 0b707d0882 fix(msword): Fixing function return in equations handling (#1194) Rafael Teixeira de Lima 2025-03-19 10:34:25 +01:00
  • 1d680b0a32 docs: Linux Foundation AI & Data (#1183) Michele Dolfi 2025-03-19 09:05:57 +01:00
  • 54a78c307d docs: move apify to docs (#1182) Michele Dolfi 2025-03-18 16:43:55 +01:00
  • 2f72167ff6 feat: updated vlm pipeline (with latest changes from docling-core) (#1158) Maxim Lysak 2025-03-18 15:44:51 +01:00
  • 1a2a9e4eff chore: bump version to 2.27.0 [skip ci] v2.27.0 github-actions[bot] 2025-03-18 13:37:45 +00:00
  • 6eaae3cba0 feat: add factory for ocr engines via plugins (#1010) Michele Dolfi 2025-03-18 13:58:05 +01:00
  • 3960b199d6 feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) Christoph Auer 2025-03-18 10:38:19 +01:00
  • 772487f9c9 feat(actor): Docling Actor on Apify infrastructure (#875) Václav Vančura 2025-03-18 10:17:44 +01:00
  • 75a03c4257 disable GT generation on test_interfaces cau/dpv4-test-updates Christoph Auer 2025-03-17 11:31:18 +01:00
  • 9359f86c6a Merge branch 'cau/docling-parse-api' of github.com:DS4SD/docling into cau/dpv4-test-updates Christoph Auer 2025-03-17 11:17:31 +01:00
  • 50ac62b5fa test_input_doc use default backend Christoph Auer 2025-03-17 11:13:42 +01:00
  • 7bce91893c Unset DPv1 backend on tests (use DPv4 default), re-generate test output Christoph Auer 2025-03-17 11:04:41 +01:00
  • eff907811a Merge branch 'main' of github.com:DS4SD/docling into cau/docling-parse-api Christoph Auer 2025-03-17 10:37:13 +01:00
  • 7e01798417 docs: fix spelling of picture in usage (#1165) serced 2025-03-17 09:33:51 +01:00
  • fe45d30942 Fixes for DPv4 backend init, better test coverage Christoph Auer 2025-03-17 09:26:31 +01:00
  • e34c0750a7 Reset all tests to use docling-parse v1 for now Christoph Auer 2025-03-14 16:39:16 +01:00
  • 412c013d95 Merge from main Christoph Auer 2025-03-14 13:52:36 +01:00
  • d654568ad9 Test all backends, fixes Christoph Auer 2025-03-14 13:32:37 +01:00