Commit Graph

  • 10165dda8a chore: bump version to 2.56.0 [skip ci] v2.56.0 github-actions[bot] 2025-10-13 09:19:06 +00:00
  • db985bb159 fix(asr): Implement robust status check in AsrPipeline (#2442) Animesh 2025-10-13 13:21:31 +05:30
  • 90200443bc docs: Remove deprecated call in custom_convert.py (#2447) Jeremy Chen 2025-10-13 18:30:02 +11:00
  • 2a0f56390a docs: fixed a few typos (#2441) Imad Saddik 2025-10-13 08:04:50 +01:00
  • f7244a4333 feat: AutoOCR model selecting the best OCR model available and deprecating the usage of EasyOCR (#2391) Michele Dolfi 2025-10-10 16:11:39 +02:00
  • cce18b2ff7 fix: deal with chartsheets in workbooks (#2433) Cesar Berrospi Ramis 2025-10-10 15:06:38 +02:00
  • f11f8c0a81 feat: Add Tesseract PSM options support (#2411) Bruno Pio 2025-10-10 09:44:30 -03:00
  • ee5501320e fix: skip temporary docx files (#2413) Victor Moreli 2025-10-10 04:39:26 -03:00
  • b5f7fef29b fix: AsrPipeline to handle absolute paths and BytesIO streams correctly (#2407) pixiake 2025-10-10 15:37:15 +08:00
  • f2854b2e1d docs: Add MongoDB + VoyageAI (#2382) Utsav Talwar 2025-10-08 00:06:19 +05:30
  • 0610d01afa fix: enrichment of documents without pages metadata (pptx and xlsx) (#2401) Michele Dolfi 2025-10-07 18:28:51 +02:00
  • 9705f4020c fix: Proper heading support in rich tables for HTML backend (#2394) Maxim Lysak 2025-10-07 15:57:32 +02:00
  • 8a4b946a1a docs: add RAG example with MongoDB Atlas Vector Search and VoyageAI embeddings (#2341) Utsav Talwar 2025-10-03 16:59:43 +05:30
  • 22515b546a chore: bump version to 2.55.1 [skip ci] v2.55.1 github-actions[bot] 2025-10-03 10:26:26 +00:00
  • 68230fe7e5 ci: split workflow to speedup CI runtime (#2313) Rui Dias Gomes 2025-10-03 10:16:38 +01:00
  • ee73ffae15 fix(markdown): Setext heading support (#2359) Matvei Smirnov 2025-10-03 11:32:53 +03:00
  • 246de77d8c fix(docs): fixed the color scheme (#2371) Hakeem Abbas 2025-10-03 13:20:44 +05:00
  • a975a790c9 docs: example using Hashicorp Vault PII transform (#2373) Michele Dolfi 2025-10-03 09:53:29 +02:00
  • 9505202e38 ci: update docling-parse and remove pages.json (#2372) Michele Dolfi 2025-10-03 09:53:13 +02:00
  • ca2be7ff3a fix: Empty table handling (#2365) Christoph Auer 2025-10-02 19:35:16 +02:00
  • e6c3b05e63 docs: Jobkit and connectors (#2357) Lucas Morin 2025-10-02 13:46:56 +02:00
  • 4f295ed051 fix: add table raw content when no table structure model is used (#1815) Michele Dolfi 2025-10-02 13:46:42 +02:00
  • f0b630e24e chore: bump version to 2.55.0 [skip ci] v2.55.0 github-actions[bot] 2025-09-30 14:50:42 +00:00
  • 1e9dc43b72 feat: Repetition-based StoppingCriteria for GraniteDocling (#2323) Christoph Auer 2025-09-30 15:26:09 +02:00
  • 68ae7ccf3c fix: pin wider range of typer (#2309) Michele Dolfi 2025-09-30 02:42:23 -04:00
  • 654c70f990 fix: Update Transformers & VLLM inference code, CLI and VLM specs (#2322) Christoph Auer 2025-09-29 21:06:54 +02:00
  • c803abed9a feat: Rich tables support for HTML backend (#2324) Maxim Lysak 2025-09-29 18:12:16 +02:00
  • 325877aee9 docs(styling): update color scheme (#2154) Hakeem Abbas 2025-09-29 14:44:40 +05:00
  • a873200c9d docs(vlm): Update SmolDocling to GraniteDocling references (#2315) Luis 2025-09-25 05:07:39 -04:00
  • 9d67bb9ed6 fix: support escaped characters in markdown backend (#2304) Lucas Morin 2025-09-23 18:00:16 +02:00
  • d599177547 chore: bump version to 2.54.0 [skip ci] v2.54.0 github-actions[bot] 2025-09-22 15:28:30 +00:00
  • e2482a2ada feat: Rich tables for MSWord backend (#2291) Maxim Lysak 2025-09-22 16:41:59 +02:00
  • 46efaaefee feat: add a backend parser for WebVTT files (#2288) Cesar Berrospi Ramis 2025-09-22 15:24:34 +02:00
  • b5628f1227 fix: correct y-axis scaling in draw_table_cells (#2287) manuflexor 2025-09-19 13:42:29 +02:00
  • 8b7e83a8c7 docs: Update API VLM example with granite-docling (#2294) Christoph Auer 2025-09-19 12:23:53 +02:00
  • 6455579a90 Stub for implementing uspto backend meta-data extraction vku/uspto_meta Viktor Kuropiatnyk 2025-09-18 10:51:01 +02:00
  • 8322c2ea9b docs: fix examples rendering (#2281) Panos Vagenas 2025-09-18 02:50:50 +02:00
  • f1687fb09b chore: bump version to 2.53.0 [skip ci] v2.53.0 github-actions[bot] 2025-09-17 13:59:33 +00:00
  • 17afb664d0 feat: Add granite-docling model (#2272) Christoph Auer 2025-09-17 15:15:49 +02:00
  • 223d7f9c62 Merge branch 'dev/add-granite-docling-extension' of github.com:DS4SD/docling into dev/add-granite-docling-extension dev/add-granite-docling-extension Christoph Auer 2025-09-16 16:34:09 +02:00
  • 63bf6b0348 Update final repo_ids for GraniteDocling Christoph Auer 2025-09-16 16:29:55 +02:00
  • bf9638244f Update final repo_ids for GraniteDocling Christoph Auer 2025-09-16 16:12:35 +02:00
  • a3709f4776 Merge branch 'main' of github.com:DS4SD/docling into dev/add-granite-docling-extension Christoph Auer 2025-09-16 16:12:22 +02:00
  • ff351fd40c docs: Describe examples (#2262) Mingxuan Zhao 2025-09-16 10:00:38 -04:00
  • 0e95171dd6 feat(RapidOcr): Support generic extra arguments for RapidOcr (#2266) dmorady1 2025-09-16 07:26:10 +02:00
  • 43d3c74bb2 update docs and README Michele Dolfi 2025-09-15 15:44:42 +02:00
  • c5a59eb979 use granite-docling and add to the model downloader Michele Dolfi 2025-09-15 15:39:08 +02:00
  • 0f8728a8d4 typo Michele Dolfi 2025-09-15 15:28:04 +02:00
  • 6a2cfbdbb8 Merge remote-tracking branch 'origin/main' into dev/add-granite-docling-extension Michele Dolfi 2025-09-15 15:26:45 +02:00
  • ad2f738231 chore: update lock (#2265) Michele Dolfi 2025-09-15 11:19:15 +02:00
  • 609d902eef fix: handle empty result from RapidOCR to avoid crash (#2264) Yuie. 2025-09-15 17:04:33 +09:00
  • 10bb0aee2d chore: bump version to 2.52.0 [skip ci] v2.52.0 github-actions[bot] 2025-09-11 16:11:20 +00:00
  • 0700af212c fix: Add missing features in ThreadedStandardPdfPipeline (#2252) Christoph Auer 2025-09-11 16:26:02 +02:00
  • 2c9123419f feat: enrichment steps on all convert pipelines (incl docx, html, etc) (#2251) Michele Dolfi 2025-09-11 15:09:00 +02:00
  • c6965495a2 fix: address deprecation warnings of dependencies (#2237) Michele Dolfi 2025-09-10 14:38:34 +02:00
  • f8cc545bab docs: add an example of RAG with OpenSearch (#2238) Cesar Berrospi Ramis 2025-09-10 14:37:22 +02:00
  • e5cd7020bd docs: Add instructions for using Docling with MCP to README (#2219) Roy Derks 2025-09-10 01:02:28 -07:00
  • 1324eb75fc add modified test results dev-granite-docling-table Michele Dolfi 2025-09-10 08:43:29 +02:00
  • a4efd70410 dev: use granite-docling for table structure Michele Dolfi 2025-09-09 18:16:16 +02:00
  • 55f5f3752f docs: Document VLM support requirement in extraction example (#2231) Tamás Bitai 2025-09-09 13:45:55 +02:00
  • ae9ec37cf1 doing some experiments with granite-docling dev/analysis-for-granite-docling Peter Staar 2025-09-08 06:03:18 +02:00
  • 0e2f370f4f updated the model specs Peter Staar 2025-09-05 16:58:43 +02:00
  • df60673992 chore: bump version to 2.51.0 [skip ci] v2.51.0 github-actions[bot] 2025-09-05 13:01:33 +00:00
  • c1dcb0597d adding granite-docling preview Peter Staar 2025-09-05 15:00:05 +02:00
  • b49d1ad4f1 feat: updating default parameters to get better performance with docling-parse (#2208) Peter W. J. Staar 2025-09-05 14:06:21 +02:00
  • a9f41b088e docs: add information extraction example (#2199) Panos Vagenas 2025-09-05 11:27:09 +02:00
  • b3d7542061 feat: updated the backend for new docling-parse (#2187) Peter W. J. Staar 2025-09-05 10:42:31 +02:00
  • 2c3f6faf3d chore: update deprecation note for OcrEngine (#2200) Alina Ryan 2025-09-05 02:24:14 -04:00
  • effd9de250 updated the ground-truth output dev/update-to-latest-docling-parse-again Peter Staar 2025-09-04 05:22:54 +02:00
  • cffa6e05d0 reformatted code Peter Staar 2025-09-03 16:22:19 +02:00
  • 0ec99e0f37 updated docling to start running the tests ... Peter Staar 2025-09-03 16:09:51 +02:00
  • 3419c42f10 chore: bump version to 2.50.0 [skip ci] v2.50.0 github-actions[bot] 2025-09-03 11:39:08 +00:00
  • e38aa0f7f2 feat: Heron layout model as new default (#1971) Nikos Livathinos 2025-09-03 12:45:22 +02:00
  • 293e81bf9d fix(html): access to variable not yet declared (#2171) Cesar Berrospi Ramis 2025-09-02 07:59:55 +02:00
  • d68d8b678e chore: bump version to 2.49.0 [skip ci] v2.49.0 github-actions[bot] 2025-09-01 16:39:43 +00:00
  • 4d94e38223 fix(pypdfium2): Fix OCR bounding box misalignment caused by mismatched rotation metadata (#2039) AndrewTsai0406 2025-09-01 23:22:43 +08:00
  • 9f4bc5b2f1 feat: [Beta] Extraction with schema (#2138) Christoph Auer 2025-09-01 16:09:48 +02:00
  • a283ccff25 feat(msexcel): set ContentLayer.INVISIBLE for invisible sheet (#1876) Qiefan Jiang 2025-09-01 19:53:45 +08:00
  • be26044f14 chore: update docling-core lock (#2169) Panos Vagenas 2025-09-01 13:46:10 +02:00
  • 9f0286bcac fix: translation example (#2166) Shikhar Bhardwaj 2025-09-01 14:34:46 +05:30
  • 9904d14e6a fix: extend offline mode for rapidocr fonts (#2155) geoHeil 2025-09-01 09:15:47 +02:00
  • 96cab6b536 docs: enrich landing pages (#2165) Panos Vagenas 2025-08-29 17:19:05 +02:00
  • 946ea1c2cb chore: Replace the layout_predictor.predict_batch() with layout_predictor.predict() in a loop nli/layout_heron2 Nikos Livathinos 2025-08-28 15:14:51 +02:00
  • 36d44f1225 chore: Add more logs in LayoutModel Nikos Livathinos 2025-08-28 14:24:47 +02:00
  • baaf2698b4 chore: debug_heron.py: prepend the name in the saved files Nikos Livathinos 2025-08-28 13:47:50 +02:00
  • d8ca358ae8 chore: Add debugging logs in LayoutModel Nikos Livathinos 2025-08-28 13:45:33 +02:00
  • 78f81e2c59 chore: Print the PagElements input to the ReadingOrder model Nikos Livathinos 2025-08-28 10:15:27 +02:00
  • 7debe3d5ec chore: debug_heron.py: Save exported json with pretty format Nikos Livathinos 2025-08-27 18:19:52 +02:00
  • 32461ff258 chore: debug_heron.py: Update test file Nikos Livathinos 2025-08-27 18:04:22 +02:00
  • c54d511c20 chore: debug_heron.py: Disable OCR Nikos Livathinos 2025-08-27 17:31:33 +02:00
  • 6ce3cd5763 chore: debug_heron.py update the test file Nikos Livathinos 2025-08-27 16:38:40 +02:00
  • 784283a50a chore: Update test data for Heron in Linux Nikos Livathinos 2025-08-27 14:16:13 +00:00
  • 552a606b4e chore: TMP script to debug heron Nikos Livathinos 2025-08-27 16:07:49 +02:00
  • 13255ad718 Merge from main cau/multi-stage-vlm-pipeline Christoph Auer 2025-08-27 15:28:47 +02:00
  • a9dcd43a7c fix: Ensure that the visualisations happen on copies of the page image Nikos Livathinos 2025-08-27 14:16:56 +02:00
  • fb3b7b93ae chore: bump version to 2.48.0 [skip ci] v2.48.0 github-actions[bot] 2025-08-26 05:29:31 +00:00
  • fa3327e1a6 fix(html): preserve code blocks in list items (#2131) Cesar Berrospi Ramis 2025-08-26 06:43:48 +02:00
  • c0268416cf chore: add analytics (#2133) Michele Dolfi 2025-08-25 18:25:38 +02:00
  • 1435fc3b81 Update test GT Christoph Auer 2025-07-23 14:05:30 +02:00
  • 83c45b5648 Update docling-models tag for TableFormer Christoph Auer 2025-07-23 13:39:50 +02:00