Commit Graph

  • 9eceea1a8c chore: update the docs Peter Staar 2024-11-21 16:53:29 +0100
  • bcfa06cb62
    Update README.md Peter W. J. Staar 2024-11-21 16:44:29 +0100
  • 6b58e7452a chore: update the README Peter Staar 2024-11-21 16:32:44 +0100
  • 6fd739fb7b docs: add DocETL, Kotaemon, spaCy integrations; minor docs improvements Panos Vagenas 2024-11-21 14:19:08 +0100
  • 40c6a803d7 fix: force pydantic < 2.10.0 Michele Dolfi 2024-11-21 14:01:36 +0100
  • 97d571af97
    chore: add downloads in README, security policy and update ci actions (#401) Michele Dolfi 2024-11-21 13:59:45 +0100
  • 02f14e2d23 add citation file Michele Dolfi 2024-11-21 11:16:32 +0100
  • 8120347b71 add pypi downloads badge Michele Dolfi 2024-11-21 10:58:34 +0100
  • fc9a5fc33b add comment about licenses for new dependencies Michele Dolfi 2024-11-21 10:55:26 +0100
  • 2b43c4253d update deprecated actions Michele Dolfi 2024-11-21 10:51:00 +0100
  • a01351640e add security policy Michele Dolfi 2024-11-21 10:48:44 +0100
  • 86d9a2ca00 syncing with latest commit on original branch swayam-singhal 2024-11-20 22:50:45 +0530
  • eb64f6d368 chore: bump version to 2.7.0 [skip ci] v2.7.0 github-actions[bot] 2024-11-20 15:36:51 +0000
  • 7b013abcf3
    fix: python3.9 support (#396) Michele Dolfi 2024-11-20 15:21:40 +0100
  • d4f136360a update deps Michele Dolfi 2024-11-20 14:52:15 +0100
  • afcd845674 pin docling-parse with python3.9 wheels Michele Dolfi 2024-11-20 14:47:11 +0100
  • 619907ad64 fixes for python3.9 Michele Dolfi 2024-11-20 13:32:30 +0100
  • 6efa96c983
    feat: add support for ocrmac OCR engine on macOS (#276) nuridol 2024-11-20 20:51:19 +0900
  • a277ea67e3 fix: update ocrmac dependency with macOS-specific marker Suhwan Seo 2024-11-20 19:35:58 +0900
  • a3088218a8 fix: propagate document limits to converter (#388) Michele Dolfi 2024-11-20 08:36:51 +0100
  • db14192f68 added documentation, tests and updated dependencies to support paddleocr Swaymaw 2024-11-20 15:05:13 +0530
  • 93f50a197e original readme for pull request Swaymaw 2024-11-20 14:12:46 +0530
  • 476affeb64 original readme for pull request Swaymaw 2024-11-20 14:10:36 +0530
  • c13b128694 feat: add optional ocrmac support Suhwan Seo 2024-11-20 19:06:41 +0900
  • fc0523b12d integrated paddleocr model for performing accurate ocr when using docling document converter swayam-singhal 2024-11-20 13:19:14 +0530
  • 9ffd3d9a2b Update README.md Swaymaw 2024-11-20 13:08:00 +0530
  • 2576e65753 docs: update examples and installation for ocrmac support Suhwan Seo 2024-11-20 17:31:57 +0900
  • 0c23b69433 feat: add support for ocrmac OCR engine on macOS NuRi 2024-11-08 09:08:55 +0900
  • 2ff8d5d297 original readme for pull request Swaymaw 2024-11-20 14:12:46 +0530
  • 7111d1d7c4 original readme for pull request Swaymaw 2024-11-20 14:10:36 +0530
  • 383ad1801f integrated paddleocr model for performing accurate ocr when using docling document converter swayam-singhal 2024-11-20 13:19:14 +0530
  • 318d42c369
    Update README.md Swaymaw 2024-11-20 13:08:00 +0530
  • 32ebf55e33
    fix: propagate document limits to converter (#388) Michele Dolfi 2024-11-20 08:36:51 +0100
  • 87d1e0fa32 fix: propagate document limits to converter Michele Dolfi 2024-11-20 08:07:43 +0100
  • ce38baf7f7 add multiple improvements and fixes Panos Vagenas 2024-11-19 23:36:50 +0100
  • 5a8186b8fb
    Sample chunking notebook that includes merging, etc. (#193) Bill Murdock 2024-11-19 17:12:04 -0500
  • 2cfaceb787 chore: bump version to 2.6.0 [skip ci] v2.6.0 github-actions[bot] 2024-11-19 16:07:34 +0000
  • 3f91e7d3f1
    feat: added support for exporting DocItem to an image when page image is available (#379) Shubham Gupta 2024-11-19 16:28:52 +0100
  • 911c3bda27
    docs: fixed typo in v2 example v2 (#378) Gaspard Petit 2024-11-19 10:27:19 -0500
  • ed785ea122
    feat: expose ocr-lang in CLI (#375) Michele Dolfi 2024-11-19 15:58:49 +0100
  • 6c518bcdf4 Merge remote-tracking branch 'origin/main' into shubham/docitem-images Shubham Gupta 2024-11-19 15:44:21 +0100
  • 2a02ee83de use regex for supporting multiple sep Michele Dolfi 2024-11-19 15:40:14 +0100
  • fea03cfd78
    Update v2.md - fixed typo in example: iterate_items -> iterate_items() Gaspard Petit 2024-11-19 09:26:05 -0500
  • 926dfd29d5
    feat: added excel backend (#334) Peter W. J. Staar 2024-11-19 12:21:17 +0100
  • b38c001224 fixed the poetry lock conflicts Peter Staar 2024-11-19 11:28:01 +0100
  • d23aea981d reformatted the code Peter Staar 2024-11-19 11:22:29 +0100
  • f837105a09 added tests for merged cells in excel Peter Staar 2024-11-19 11:21:41 +0100
  • e625f5d87b feat: expose ocr-lang in CLI Michele Dolfi 2024-11-19 10:57:35 +0100
  • e6f89d520f
    chore: update lock of deps (#371) Michele Dolfi 2024-11-19 10:23:59 +0100
  • 6a7ed4618b chore: update dependencies Panos Vagenas 2024-11-19 09:40:35 +0100
  • 7368013669 reformatted the code Peter Staar 2024-11-19 06:31:57 +0100
  • 8c42f760a2 merged with main and resolved all conflicts Peter Staar 2024-11-19 06:26:42 +0100
  • 70abf9d080 fixed the mypy Peter Staar 2024-11-19 05:59:07 +0100
  • b312657f6b updated the msexcel (2) Peter Staar 2024-11-19 05:45:33 +0100
  • 5d5600e194 updated the msexcel Peter Staar 2024-11-19 05:42:44 +0100
  • 8c0be69408 Updated examples to use get_image instead of element.image Shubham Gupta 2024-11-18 19:00:00 +0100
  • 06e57eec9b chore: update lock of deps Michele Dolfi 2024-11-18 18:42:59 +0100
  • 7a79047af5 Deprecated the generate_table_images option Shubham Gupta 2024-11-18 17:55:46 +0100
  • b9cbcf7782 Updated minimum docling-core version to 2.4.0 Shubham Gupta 2024-11-18 17:47:46 +0100
  • 7a97d7119f
    feat: Extracting picture data for raster images found in PPTX (#349) Maxim Lysak 2024-11-18 15:22:28 +0100
  • b132c68a32 Inferring image DPI from pptx file Maksym Lysak 2024-11-18 14:06:43 +0100
  • 22400089c0 Added tests for pptx Maksym Lysak 2024-11-18 10:44:43 +0100
  • 3fb20b9eec fixed the mypy Peter Staar 2024-11-17 07:02:14 +0100
  • b0e5154d87 merged with main Peter Staar 2024-11-17 06:03:12 +0100
  • 7c494270ac reformatted the code Peter Staar 2024-11-17 05:59:35 +0100
  • 5b6090bee3 adding images to output [WIP] Peter Staar 2024-11-16 09:21:28 +0100
  • 4698459c59 ran poetry lock Peter Staar 2024-11-16 09:07:26 +0100
  • 34b7353cd3 added the unit tests Peter Staar 2024-11-16 08:34:09 +0100
  • c9c4810c25 refactor EXCEL to XLSX Peter Staar 2024-11-16 08:05:30 +0100
  • bc31f2a973 added proper typing for mypy Peter Staar 2024-11-16 08:04:17 +0100
  • b8f1439880 added proper typing for mypy Peter Staar 2024-11-16 07:58:20 +0100
  • b1c654c5ef first working version for excel parsing of tables Peter Staar 2024-11-16 07:43:34 +0100
  • 7dbdbdeaf3
    ci: fix mergify (#350) Michele Dolfi 2024-11-15 17:13:01 +0100
  • 388e022406 fix mergify rules Michele Dolfi 2024-11-14 16:09:46 +0100
  • b20092e1dc no conv commit message Michele Dolfi 2024-11-14 13:47:53 +0100
  • 0493efe796 added tooling for the cli Peter Staar 2024-11-15 16:21:31 +0100
  • a54f583415 Added picture data for pptx pictures Maksym Lysak 2024-11-15 14:35:25 +0100
  • 364d37ca96
    ci(Mergify): configuration update (#339) Michele Dolfi 2024-11-15 13:18:33 +0100
  • ca8524ecae
    docs: add automatic generation of CLI reference (#325) Michele Dolfi 2024-11-15 13:18:17 +0100
  • 25fd149c38
    docs: add architecture outline (#341) Panos Vagenas 2024-11-15 12:52:41 +0100
  • 835e077b02
    docs: fix parameter in usage.md (#332) Carl 2024-11-15 09:24:15 +0100
  • bf9c299eb9 first msexcel backend Peter Staar 2024-11-15 06:04:32 +0100
  • 4a584e8abd docs: fix parameter in usage.md Carl Senze 2024-11-13 19:05:07 +0100
  • 71eb06258a install deps for building CLI ref Michele Dolfi 2024-11-14 17:02:39 +0100
  • 9178962c5c remove conventionalcommits from the checklist Michele Dolfi 2024-11-14 16:14:06 +0100
  • 1d1d18dfc6 docs: add architecture outline Panos Vagenas 2024-11-14 15:41:03 +0100
  • de0b7c85d9 no conv commit message Michele Dolfi 2024-11-14 13:47:53 +0100
  • df0c773eae ci(Mergify): configuration update Michele Dolfi 2024-11-14 13:42:36 +0100
  • 8533039b0c
    fix: Fixing images in the input Word files (#330) Maxim Lysak 2024-11-14 13:33:34 +0100
  • 0e56549082 removed base64 dependency in msword_backend Maksym Lysak 2024-11-14 11:38:22 +0100
  • f197f908cb Updated tests Maksym Lysak 2024-11-14 11:20:17 +0100
  • c8888fe4c4 Populating extracted image data into docling picture for wordx backend Maksym Lysak 2024-11-14 10:42:09 +0100
  • c8aed776e2 Fixing images in the input Word files Maksym Lysak 2024-11-13 17:16:49 +0100
  • bf2a85f1d4
    chore: fix Qdrant notebook Colab link (#319) Panos Vagenas 2024-11-14 10:42:02 +0100
  • 0435bfe4e4 feat: added excel backend Peter Staar 2024-11-14 07:49:56 +0100
  • f4fc6cfd4a added TableFormerMode.ACCURATE as default in cli Peter Staar 2024-11-14 07:45:36 +0100
  • 22db6aabfb docs: add automatic generation of CLI reference Michele Dolfi 2024-11-13 10:09:48 +0100
  • 8b437adcde
    fix: reduce logging by keeping option for more verbose (#323) Michele Dolfi 2024-11-13 10:08:24 +0100
  • 96d2ebc97c fix: reduce logging by keeping option for more verbose Michele Dolfi 2024-11-13 09:45:30 +0100
  • e4383cee56 experimental, wip - adding optional word-level cells to page_processing and table model Maksym Lysak 2024-11-13 09:45:23 +0100