Commit Graph

  • bf82f4dc73 Deployed edbabfc with MkDocs version: 1.6.1 gh-pages 2025-12-08 11:47:28 +00:00
  • edbabfcac2 fix: add missing font download in the rapidocr artifacts (#2735) main Michele Dolfi 2025-12-08 12:44:53 +01:00
  • 609069d12c fix: Ensure proper image_scale for generated page images in VLM pipelines (#2728) Christoph Auer 2025-12-05 13:16:11 +01:00
  • d007ba0e6f fix(html): tackle paragraphs with block-level elements (#2720) Cesar Berrospi Ramis 2025-12-05 12:52:53 +01:00
  • 3df3cf8664 fix: add page as argument to build_prompt elh/update_2stage_inference ElHachem02 2025-12-04 13:36:20 +01:00
  • aebe25cf00 fix(html): prevent hierarchy reset in rich table cells (#2716) Matvei Smirnov 2025-12-03 20:52:23 +03:00
  • 0904dbb95a feat: update inference code to shuffle layout elements and discard initial prompt ElHachem02 2025-12-03 12:59:31 +01:00
  • 92e4f2220a Fix artifacts_path handling in Layout+VLM pipeline cau/fix-layout-vlm-pipeline-artifacts-path Christoph Auer 2025-12-03 12:52:22 +01:00
  • c97715f5fd fix(docx): parse integrals as n-ary objects without chr element (#2712) Cesar Berrospi Ramis 2025-12-03 11:25:52 +01:00
  • f80c903c24 chore: bump version to 2.64.0 [skip ci] v2.64.0 github-actions[bot] 2025-12-02 11:25:22 +00:00
  • 6ef4ffd643 fix: InputFormat.IMAGE must have correct pipeline (#2707) Christoph Auer 2025-12-01 19:44:16 +01:00
  • 5bbc94daf8 Add page image injection cau/layout-vlm-pipeline-page-images Christoph Auer 2025-12-01 15:20:41 +01:00
  • 54cd6d7406 fix: do not consider singleton cells in xlsx as TableItems but rather TextItems (#2589) glypt 2025-11-27 16:25:32 +01:00
  • c0b57ae389 chore: Cleaning the example of post_process_ocr_with_vlm (#2693) Maxim Lysak 2025-11-27 12:38:45 +01:00
  • fa21128138 docs: Example on how to apply external OCR as post processing (#2517) Maxim Lysak 2025-11-27 11:04:40 +01:00
  • 0049857c7d chore: update mlx lock (#2689) Panos Vagenas 2025-11-27 10:25:07 +01:00
  • 134436245a feat(experimental): Add experimental TableCropsLayoutModel (#2669) Christoph Auer 2025-11-25 05:14:51 +01:00
  • b75c6461f4 docs: More GPU results and improvements in the example docs (#2674) Michele Dolfi 2025-11-24 15:26:08 +01:00
  • 146b4f0535 docs: fix typo on jobkit page (#2671) Muhammad Ali Hasan 2025-11-24 02:35:45 -06:00
  • e58055465c fix(docx): Missing list items after numbered header (#2665) Michele Dolfi 2025-11-24 08:49:21 +01:00
  • ad97e52851 feat: Factory and plugin-capability for Layout and Table models (#2637) Christoph Auer 2025-11-21 10:26:06 +01:00
  • dcb57bf528 chore: bump version to 2.63.0 [skip ci] v2.63.0 github-actions[bot] 2025-11-20 14:42:37 +00:00
  • 2087c6bf9f fix: Respect document_timeout in new threaded StandardPdfPipeline (#2653) Christoph Auer 2025-11-20 14:57:14 +01:00
  • 54e65d9511 chore: update Milvus on examples and references to deprecated method (#2664) Cesar Berrospi Ramis 2025-11-20 13:22:45 +01:00
  • ce5a099dfd docs: Add Hector as compatible AI agent platform integration (#2662) kadirpekel 2025-11-20 13:02:47 +01:00
  • b559813b9b feat: add save and load for conversion result (#2648) Peter W. J. Staar 2025-11-20 12:45:26 +01:00
  • 6fb9a5f98a fix: In DocumentConverter.convert_string() make nullable name parameter optional (#2660) Cristi Burcă 2025-11-20 05:24:27 +00:00
  • 463a3fd474 fix: Enable GPU for RapidOCR when available (#2659) Michele Dolfi 2025-11-19 17:12:00 +01:00
  • b216ad848d docs: Added documentation to use SuryaOCR via plugin docling-surya (#2533) Harry Ho 2025-11-19 22:27:24 +08:00
  • 6fe6aae91a Apply ruff formatting to test file copilot/fix-page-range-bug copilot-swe-agent[bot] 2025-11-19 13:28:01 +00:00
  • 0788e714a9 Add comprehensive tests for page_range bug fix copilot-swe-agent[bot] 2025-11-19 13:26:25 +00:00
  • 58fc6ccf86 Fix page_range stopping at page 32 by using dynamic batch_size copilot-swe-agent[bot] 2025-11-19 13:25:00 +00:00
  • 18f705b235 Initial plan copilot-swe-agent[bot] 2025-11-19 13:10:27 +00:00
  • 03e7c7d924 docs: Fix broken homepage links (#2651) Robyn Johnson 2025-11-19 01:19:56 -06:00
  • 8af228f1e2 docs(examples): processing parquet file of images (#2641) Michele Dolfi 2025-11-19 06:39:25 +01:00
  • da4c2e9dbe fix: remove py3.14 requirement for default rapidocr (#2639) Michele Dolfi 2025-11-18 17:23:43 +01:00
  • d549445e78 docs: Move Installation and Quickstart (Usage) under Getting started (#2644) Ryan Soliveres 2025-11-19 00:09:41 +08:00
  • ac9fc585bb docs: add redirection from getting started page (#2640) Panos Vagenas 2025-11-17 14:13:51 +01:00
  • f5528623a7 docs(examples): remove deprecation warnings with export_to_dataframe (#2638) Cesar Berrospi Ramis 2025-11-17 12:48:41 +01:00
  • d6ddf9f4cb chore: bump version to 2.62.0 [skip ci] v2.62.0 github-actions[bot] 2025-11-17 11:34:08 +00:00
  • 3495b73de8 feat: add the Image backend (#2627) Peter W. J. Staar 2025-11-17 11:37:22 +01:00
  • aa75dd13d3 test: mark timeout test as manual due to model requirement copilot/fix-document-timeout-bug copilot-swe-agent[bot] 2025-11-17 09:27:27 +00:00
  • e3aa8cd770 feat: add document_timeout support to StandardPdfPipeline copilot-swe-agent[bot] 2025-11-17 09:23:28 +00:00
  • f3ed123b51 Initial plan copilot-swe-agent[bot] 2025-11-17 09:17:41 +00:00
  • ae30373ee7 docs: combine Home and Getting Started pages (#2600) Robyn Johnson 2025-11-14 06:29:25 -06:00
  • 14b436d590 fix: correct the model-repo name (#2624) Peter W. J. Staar 2025-11-14 13:21:08 +01:00
  • 55908d6bb4 chore: pretest docling-core 2.51.0 pretest-core-2-51-0 Panos Vagenas 2025-11-12 16:35:49 +01:00
  • bbb66d8be0 Add documentation for reading order patch copilot/fix-keyerror-in-docling copilot-swe-agent[bot] 2025-11-12 13:07:43 +00:00
  • 570fe949c9 Add monkey patch to fix KeyError in reading order model copilot-swe-agent[bot] 2025-11-12 13:03:50 +00:00
  • 609988d3e1 Initial plan copilot-swe-agent[bot] 2025-11-12 12:48:22 +00:00
  • 4852d8b4f2 feat(experimental): Layout + VLM model with layout prompt (#2244) Christoph Auer 2025-11-12 13:42:09 +01:00
  • 054c4a634d fix(docx): parse page headers and footers (#2599) Cesar Berrospi Ramis 2025-11-10 16:10:12 +01:00
  • 463051b852 chore: bump version to 2.61.2 [skip ci] v2.61.2 github-actions[bot] 2025-11-10 11:44:59 +00:00
  • 5c27567c41 fix: default to EasyOCR in Python 3.14 (#2605) Panos Vagenas 2025-11-10 12:09:00 +01:00
  • 06ae8ae29a chore: replace ds4sd with docling-project (#2596) Peter W. J. Staar 2025-11-07 11:25:56 +01:00
  • c21327cd74 chore: bump version to 2.61.1 [skip ci] v2.61.1 github-actions[bot] 2025-11-06 05:19:20 +00:00
  • ef623ffcee fix(docx): slow table parsing (#2553) Cesar Berrospi Ramis 2025-11-06 05:25:53 +01:00
  • 0ba8d5d9e3 fix(html): slow table parsing (#2582) Cesar Berrospi Ramis 2025-11-06 05:25:36 +01:00
  • 8da3d287ed docs: make navigation menus collapse and expand (#2573) Robyn Johnson 2025-11-05 22:25:19 -06:00
  • 0ccc0a3245 chore: bump version to 2.61.0 [skip ci] v2.61.0 github-actions[bot] 2025-11-06 04:25:06 +00:00
  • fa925741b6 fix: temporarily pin NuExtract to working revision (#2588) Panos Vagenas 2025-11-05 21:23:12 +01:00
  • 8940045463 replace match with if docs/add-extraction-script Peter Staar 2025-11-05 16:57:16 +01:00
  • 1ec6c58b95 adding extraction script Peter Staar 2025-11-05 15:43:56 +01:00
  • 6a04e27352 feat(vlm): track generated tokens and stop reasons for VLM models (#2543) peets 2025-11-04 19:39:09 +01:00
  • 1a5146abc9 fix(ocr): use PSM integer values directly instead of constructor (#2578) 정물결 2025-11-05 03:32:41 +09:00
  • 32a5aed5ea chore: bump version to 2.60.1 [skip ci] v2.60.1 github-actions[bot] 2025-11-04 11:26:12 +00:00
  • 0e1b0bd816 chore: switch print statements to debug logging (#2569) Panos Vagenas 2025-11-04 11:32:39 +01:00
  • fb737d026e chore: fix malformed f-string (#2563) Johannes Damp 2025-11-04 11:01:26 +01:00
  • 8360aa5449 fix: extract response from api_image_request in picture description (#2571) peets 2025-11-04 08:39:15 +01:00
  • 3467b0a035 chore: bump version to 2.60.0 [skip ci] v2.60.0 github-actions[bot] 2025-10-31 14:43:29 +00:00
  • 268d027c8f feat: Use threading in the standard pipeline and move old behavior to legacy (#2452) Michele Dolfi 2025-10-31 14:42:11 +01:00
  • 01577e92d1 docs: Update link to Open WebUI docs (#2549) Welteam 2025-10-31 12:21:11 +00:00
  • cb100437fa docs: Update installation options with extras and review FAQ (#2548) Michele Dolfi 2025-10-31 13:21:01 +01:00
  • 741c44fa45 docs: fix typos (#2546) Yasir Ali 2025-10-31 18:29:34 +09:00
  • a51275d080 fix(pdf): threadsafe for pypdfium2 backend (#2527) Michele Dolfi 2025-10-30 17:58:39 +01:00
  • d27fe92e01 chore: bump version to 2.59.0 [skip ci] v2.59.0 github-actions[bot] 2025-10-30 13:05:56 +00:00
  • 97aa06bfbc docs: Add details and examples on optimal GPU setup (#2531) Michele Dolfi 2025-10-30 13:22:05 +01:00
  • d9c90eb45e fix: xlsx cell parsing, now returning values instead of formulas (#2520) glypt 2025-10-29 11:35:51 +01:00
  • b6c892b505 feat(vlm): add num_tokens as attribtue for VlmPrediction (#2489) peets 2025-10-28 17:18:44 +01:00
  • cdffb47b9a feat: Support for Python 3.14 (#2530) Michele Dolfi 2025-10-28 14:32:15 +01:00
  • 9a6fdf936b docs: update opensearch notebook and backend documentation (#2519) Cesar Berrospi Ramis 2025-10-27 10:02:50 +01:00
  • 10c1f06b74 chore: bump version to 2.58.0 [skip ci] v2.58.0 github-actions[bot] 2025-10-22 11:31:29 +00:00
  • bbe82a68d0 feat(pdf): Support for password-protected PDF documents (#2499) Michele Dolfi 2025-10-22 12:48:01 +02:00
  • 89820d01b5 perf: use docling-parse-v4 as default (#2503) Michele Dolfi 2025-10-21 17:55:43 +02:00
  • 86556d8367 docs: fix typo in mcp.md (#2502) McGuireMark 2025-10-21 11:31:28 -04:00
  • 4227fcc3e1 fix(markdown): set the correct discriminator in md backend options (#2501) Cesar Berrospi Ramis 2025-10-21 14:30:48 +02:00
  • a30e6a7614 feat(backend): add generic options support and HTML image handling modes (#2011) Legoshi 2025-10-21 12:52:17 +02:00
  • b66624bfff fix(xlsx): speed up by detecting the true last non-empty row/column (#2404) Richard (Huangrui) Chu 2025-10-21 02:08:20 -04:00
  • 657ce8b01c feat(ASR): MLX Whisper Support for Apple Silicon (#2366) Ken Steele 2025-10-20 23:05:59 -07:00
  • a5af082d82 chore: fix parsing of release body message (#2498) Michele Dolfi 2025-10-20 13:41:35 +02:00
  • 5be856fbc0 chore: add action posting to discord (#2486) Michele Dolfi 2025-10-17 16:31:57 +02:00
  • ee5aedc955 add ocr as enrichment for pictures in simple pipeline ocr-enrichment Michele Dolfi 2025-10-17 16:04:57 +02:00
  • dd03b53117 docs: discord badge with join link (#2473) Michele Dolfi 2025-10-16 10:13:50 +02:00
  • 1762bb8762 chore: update lock (#2468) Michele Dolfi 2025-10-15 20:35:49 +02:00
  • ae61d640c1 chore: bump version to 2.57.0 [skip ci] v2.57.0 github-actions[bot] 2025-10-15 09:20:31 +00:00
  • 16829939cf feat(docx): Process drawingml objects in docx (#2453) Rafael Teixeira de Lima 2025-10-15 10:58:08 +02:00
  • 3e6da2c62d docs: Example on PII obfuscation (#2459) Peter W. J. Staar 2025-10-14 15:39:16 +02:00
  • cd7f7ba145 fix: Use proper page concatentation in VLM pipeline MD/HTML conversion (#2458) Christoph Auer 2025-10-14 14:12:26 +02:00
  • 3687d865f8 chore: bump version to 2.56.1 [skip ci] v2.56.1 github-actions[bot] 2025-10-13 16:30:04 +00:00
  • 688a7dfd38 fix: avoid downloading easyocr models by default (#2454) Michele Dolfi 2025-10-13 17:58:06 +02:00