Commit Graph

  • 0240ae2930 Pass nested clusters through GLM as payload Christoph Auer 2024-12-03 13:58:27 +01:00
  • 4dcc738b6d Pass nested cluster processing through full pipeline Christoph Auer 2024-12-03 13:08:45 +01:00
  • 34c7c79858 fix: improve handling of disallowed formats (#429) Christoph Auer 2024-12-03 12:45:32 +01:00
  • 0be736227f fix: improve handling of disallowed formats (#429) Christoph Auer 2024-12-03 12:45:32 +01:00
  • 2254845da3 chore: bump version to 2.8.2 [skip ci] v2.8.2 github-actions[bot] 2024-12-03 10:47:29 +00:00
  • 25a0fa38d1 chore: bump version to 2.8.2 [skip ci] github-actions[bot] 2024-12-03 10:47:29 +00:00
  • 672962a8b2 chore: update numpy lock (#500) Michele Dolfi 2024-12-03 11:21:31 +01:00
  • 9f35e368f6 chore: update numpy lock (#500) Michele Dolfi 2024-12-03 11:21:31 +01:00
  • c90c41c391 fix: ParserError EOF inside string (#470) (#472) guglie 2024-12-03 11:21:18 +01:00
  • a7e3f713bb fix: ParserError EOF inside string (#470) (#472) guglie 2024-12-03 11:21:18 +01:00
  • 5ba3807f31 docs: add styling for faq (#502) Michele Dolfi 2024-12-03 11:20:49 +01:00
  • a01cedbb69 docs: add styling for faq (#502) Michele Dolfi 2024-12-03 11:20:49 +01:00
  • 051789d017 perf: prevent temp file leftovers, reuse core type (#487) Panos Vagenas 2024-12-03 10:40:28 +01:00
  • 418d8159bd perf: prevent temp file leftovers, reuse core type (#487) Panos Vagenas 2024-12-03 10:40:28 +01:00
  • 7245cc6080 Implement hierachical cluster layout processing Christoph Auer 2024-12-03 10:28:36 +01:00
  • d3f84b2457 fix: PermissionError when using tesseract_ocr_cli_model (#496) Gaspard Petit 2024-12-03 04:22:03 -05:00
  • 32e9b4a2cf fix: PermissionError when using tesseract_ocr_cli_model (#496) Gaspard Petit 2024-12-03 04:22:03 -05:00
  • e0cf80a919 Upgraded Layout Postprocessing, sending old code back to ERZ Christoph Auer 2024-12-02 16:21:14 +01:00
  • 33cff98d36 docs: typo in faq (#484) Álvaro Huertas 2024-12-02 10:35:24 +01:00
  • 6ca85993f4 docs: typo in faq (#484) Álvaro Huertas 2024-12-02 10:35:24 +01:00
  • d4872103b8 docs: add automatic api reference (#475) Michele Dolfi 2024-12-02 09:55:52 +01:00
  • 048031d32b docs: add automatic api reference (#475) Michele Dolfi 2024-12-02 09:55:52 +01:00
  • 8ccb3c6db6 docs: introduce faq section (#468) Michele Dolfi 2024-11-29 22:34:56 +01:00
  • 0e0360a37b docs: introduce faq section (#468) Michele Dolfi 2024-11-29 22:34:56 +01:00
  • cc46c938b6 chore: bump version to 2.8.1 [skip ci] v2.8.1 github-actions[bot] 2024-11-29 13:04:48 +00:00
  • 1d81b85443 chore: bump version to 2.8.1 [skip ci] github-actions[bot] 2024-11-29 13:04:48 +00:00
  • dd8de46267 fix(cli): expose debug options (#467) Michele Dolfi 2024-11-29 13:25:58 +01:00
  • 7bd432496a fix(cli): expose debug options (#467) Michele Dolfi 2024-11-29 13:25:58 +01:00
  • af63818df5 fix: remove unused deps (#466) Michele Dolfi 2024-11-29 13:18:06 +01:00
  • 861b6a6499 fix: remove unused deps (#466) Michele Dolfi 2024-11-29 13:18:06 +01:00
  • 84c46fdeb3 docs: extend integration docs & README (#456) Panos Vagenas 2024-11-28 09:41:21 +01:00
  • 9d8d698921 docs: extend integration docs & README (#456) Panos Vagenas 2024-11-28 09:41:21 +01:00
  • 211f4f7570 chore: bump version to 2.8.0 [skip ci] v2.8.0 github-actions[bot] 2024-11-27 13:29:32 +00:00
  • 20a2cd0f53 chore: bump version to 2.8.0 [skip ci] github-actions[bot] 2024-11-27 13:29:32 +00:00
  • 85b29990be feat(ocr): added support for RapidOCR engine (#415) Swaymaw 2024-11-27 18:27:41 +05:30
  • 767563bf8b fix: use correct image index in word backend (#442) Manuel030 2024-11-27 13:45:07 +01:00
  • 29807a2d68 fix: Update tests and examples for docling-core 2.5.1 (#449) Christoph Auer 2024-11-27 13:07:00 +01:00
  • 6666d9ec07 chore: bump version to 2.7.1 [skip ci] v2.7.1 github-actions[bot] 2024-11-26 15:01:33 +00:00
  • d0a1180478 fix: Fixes for wordx (#432) Maxim Lysak 2024-11-26 14:44:43 +01:00
  • d7072b4b56 fix: force pydantic < 2.10.0 (#407) Michele Dolfi 2024-11-22 08:23:11 +01:00
  • 2a1d3fd221 chore: update the README (#409) Peter W. J. Staar 2024-11-21 17:28:53 +01:00
  • 7a45b92078 docs: add DocETL, Kotaemon, spaCy integrations; minor docs improvements (#408) Panos Vagenas 2024-11-21 17:23:04 +01:00
  • 97d571af97 chore: add downloads in README, security policy and update ci actions (#401) Michele Dolfi 2024-11-21 13:59:45 +01:00
  • eb64f6d368 chore: bump version to 2.7.0 [skip ci] v2.7.0 github-actions[bot] 2024-11-20 15:36:51 +00:00
  • 7b013abcf3 fix: python3.9 support (#396) Michele Dolfi 2024-11-20 15:21:40 +01:00
  • 6efa96c983 feat: add support for ocrmac OCR engine on macOS (#276) nuridol 2024-11-20 20:51:19 +09:00
  • 32ebf55e33 fix: propagate document limits to converter (#388) Michele Dolfi 2024-11-20 08:36:51 +01:00
  • 2cfaceb787 chore: bump version to 2.6.0 [skip ci] v2.6.0 github-actions[bot] 2024-11-19 16:07:34 +00:00
  • 3f91e7d3f1 feat: added support for exporting DocItem to an image when page image is available (#379) Shubham Gupta 2024-11-19 16:28:52 +01:00
  • 911c3bda27 docs: fixed typo in v2 example v2 (#378) Gaspard Petit 2024-11-19 10:27:19 -05:00
  • ed785ea122 feat: expose ocr-lang in CLI (#375) Michele Dolfi 2024-11-19 15:58:49 +01:00
  • 926dfd29d5 feat: added excel backend (#334) Peter W. J. Staar 2024-11-19 12:21:17 +01:00
  • e6f89d520f chore: update lock of deps (#371) Michele Dolfi 2024-11-19 10:23:59 +01:00
  • 7368013669 reformatted the code Peter Staar 2024-11-19 06:31:57 +01:00
  • 8c42f760a2 merged with main and resolved all conflicts Peter Staar 2024-11-19 06:26:42 +01:00
  • 7a97d7119f feat: Extracting picture data for raster images found in PPTX (#349) Maxim Lysak 2024-11-18 15:22:28 +01:00
  • 7dbdbdeaf3 ci: fix mergify (#350) Michele Dolfi 2024-11-15 17:13:01 +01:00
  • 364d37ca96 ci(Mergify): configuration update (#339) Michele Dolfi 2024-11-15 13:18:33 +01:00
  • ca8524ecae docs: add automatic generation of CLI reference (#325) Michele Dolfi 2024-11-15 13:18:17 +01:00
  • 25fd149c38 docs: add architecture outline (#341) Panos Vagenas 2024-11-15 12:52:41 +01:00
  • 835e077b02 docs: fix parameter in usage.md (#332) Carl 2024-11-15 09:24:15 +01:00
  • 8533039b0c fix: Fixing images in the input Word files (#330) Maxim Lysak 2024-11-14 13:33:34 +01:00
  • bf2a85f1d4 chore: fix Qdrant notebook Colab link (#319) Panos Vagenas 2024-11-14 10:42:02 +01:00
  • f4fc6cfd4a added TableFormerMode.ACCURATE as default in cli Peter Staar 2024-11-14 07:45:36 +01:00
  • 8b437adcde fix: reduce logging by keeping option for more verbose (#323) Michele Dolfi 2024-11-13 10:08:24 +01:00
  • 5a44236ac2 chore: bump version to 2.5.2 [skip ci] v2.5.2 github-actions[bot] 2024-11-13 08:19:09 +00:00
  • c9341bf22e fix: skip glm model downloads (#322) Michele Dolfi 2024-11-13 08:45:28 +01:00
  • 2c0c439a44 chore: bump version to 2.5.1 [skip ci] v2.5.1 github-actions[bot] 2024-11-12 14:56:34 +00:00
  • fb8ba861e2 fix: Handling of single-cell tables in DOCX backend (#314) Maxim Lysak 2024-11-12 15:20:55 +01:00
  • 7f5d35ea3c docs: Hybrid RAG with Qdrant (#312) Anush 2024-11-12 19:48:14 +05:30
  • 93fc1be61a docs: add Data Prep Kit integration (#316) Panos Vagenas 2024-11-12 12:21:48 +01:00
  • 777237ebc9 chore: bump version to 2.5.0 [skip ci] v2.5.0 github-actions[bot] 2024-11-12 10:19:55 +00:00
  • 5d4a10b121 fix: Configure env prefix for docling settings (#315) Christoph Auer 2024-11-12 10:57:16 +01:00
  • c6b3763ecb feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) Nikos Livathinos 2024-11-12 09:46:14 +01:00
  • 81c8243a8b fix: Added handling of grouped elements in pptx backend (#307) Maxim Lysak 2024-11-11 16:38:21 +01:00
  • 53bf2d1790 Added handling of code blocks in html with <pre> tag (#302) Maxim Lysak 2024-11-11 15:00:11 +01:00
  • 1239ade275 docs: add navigation indices (#305) Panos Vagenas 2024-11-11 14:49:06 +01:00
  • 97f214efdd fix: allow mps usage for easyocr (#286) Michele Dolfi 2024-11-10 14:26:17 +01:00
  • be8aa17291 chore: bump version to 2.4.2 [skip ci] v2.4.2 github-actions[bot] 2024-11-08 16:31:47 +00:00
  • 0eb065e9b6 fix(EasyOcrModel): Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr (#282) Nikos Livathinos 2024-11-08 16:48:41 +01:00
  • 118f162e64 chore: bump version to 2.4.1 [skip ci] v2.4.1 github-actions[bot] 2024-11-08 12:37:36 +00:00
  • 704d792a79 fix(tesserocr): Raise Exception if tesserocr has not loaded any languages (#279) Nikos Livathinos 2024-11-08 13:03:09 +01:00
  • 9e54a74410 another fix to the tests Peter Staar 2024-11-08 12:48:53 +01:00
  • 311640fb9d reformatted the code Peter Staar 2024-11-08 05:41:09 +01:00
  • 5c82ff9890 fixed the tests Peter Staar 2024-11-07 05:15:13 +01:00
  • b154d4f2d7 updated ground-truth Peter Staar 2024-11-06 10:55:18 +01:00
  • 0a5817a36e updated the html tests (2) Peter Staar 2024-11-06 05:46:09 +01:00
  • c7b9792d6b updated the html tests Peter Staar 2024-11-06 05:44:50 +01:00
  • 6c22cba0a7 chore: add issue templates (#251) Panos Vagenas 2024-11-05 23:18:20 +01:00
  • c3098e3c12 chore: fix typo (#241) Ikko Eltociear Ashimine 2024-11-06 00:20:04 +09:00
  • a84ec276b0 docs: update badges & credits (#248) Panos Vagenas 2024-11-05 13:57:06 +01:00
  • 90836db90a fix: Dockerfile example copy command (#234) Anthony R 2024-11-05 12:48:27 +01:00
  • 5ce02c5c59 docs: add coming-soon section (#235) Panos Vagenas 2024-11-05 08:53:02 +01:00
  • d5e65aedac docs: add artifacts-path param to CLI (#233) Panos Vagenas 2024-11-05 08:51:21 +01:00
  • ddd1474c8d reformatted the code Peter Staar 2024-11-05 07:25:21 +01:00
  • 3257034631 replace new lines and double spaces in list-items with single spaces Peter Staar 2024-11-05 07:24:31 +01:00
  • f276c0cc90 updated the html backend to add svg, remove empty list-items and use data-content fields Peter Staar 2024-11-05 06:37:43 +01:00
  • e30a9c25a2 chore: bump version to 2.4.0 [skip ci] v2.4.0 github-actions[bot] 2024-11-04 15:11:09 +00:00
  • 862d78d271 chore: update pyproject.toml metadata (#229) Panos Vagenas 2024-11-04 15:48:00 +01:00
  • eeee3b4371 docs: add explicit artifacts path example (#224) Panos Vagenas 2024-11-04 14:27:56 +01:00