Commit Graph

  • 6cb344c047
    Update easyocr_model.py itsainii 2024-12-16 16:04:55 +0800
  • a5c2fb3052
    Merge branch 'DS4SD:main' into main Fabio 2024-12-13 22:54:00 -0300
  • 162e89e013
    Merge pull request #2 from 0xCarbon/feat/disable_image_labeling Fabio 2024-12-13 22:52:52 -0300
  • 31184ad516 chore: bump version to 2.12.0 [skip ci] github-actions[bot] 2024-12-13 18:27:00 +0000
  • a2db5fbd0f chore: bump version to 2.12.0 [skip ci] v2.12.0 github-actions[bot] 2024-12-13 18:27:00 +0000
  • e0aaa783c5 Merge branch 'main' of github.com:0xCarbon/docling into feat/disable_image_labeling João 2024-12-13 13:57:35 -0300
  • 1e016fd776
    Merge pull request #1 from DS4SD/main jpcanesin 2024-12-13 13:55:48 -0300
  • a4dc21395d blacklisted the picture layout tag so that it is forced to interpret the contents of the image and retrieve text that otherwise would be lost with an image tag João 2024-12-13 13:46:25 -0300
  • 16bd38cbf4 feat: Introduce support for GPU Accelerators (#593) Nikos Livathinos 2024-12-13 17:45:22 +0100
  • 19fad9261c
    feat: Introduce support for GPU Accelerators (#593) Nikos Livathinos 2024-12-13 17:45:22 +0100
  • 24f0346d84 Nail the accelerator defaults for MPS Christoph Auer 2024-12-13 17:19:10 +0100
  • 6832c5aeba Review fixes Christoph Auer 2024-12-13 15:44:06 +0100
  • d2f114e5d6 Remove unused debug settings Christoph Auer 2024-12-13 15:24:08 +0100
  • e4624d862c Update test gt Christoph Auer 2024-12-13 15:11:57 +0100
  • 161f92fe8c Rollback changes from main Christoph Auer 2024-12-13 14:38:34 +0100
  • 8cb7d8327a Fixes for cluster pre-ordering Christoph Auer 2024-12-13 14:17:21 +0100
  • a9ff29bbaa Fixes for cluster pre-ordering Christoph Auer 2024-12-13 14:17:21 +0100
  • 6209cf3bc5 Merge branch 'main' into nli/performance_main Nikos Livathinos 2024-12-13 13:58:37 +0100
  • 30dbab505b fix: Do proper check to set the device in EasyOCR, RapidOCR. Nikos Livathinos 2024-12-10 14:46:21 +0000
  • d972a29f2a Fix table box snapping Christoph Auer 2024-12-13 08:44:22 +0100
  • dd4f72ef29 Fix table box snapping Christoph Auer 2024-12-13 08:44:22 +0100
  • 12ccf20ddc Update test GT Christoph Auer 2024-12-12 20:37:48 +0100
  • 3f854bdb28 Update test GT Christoph Auer 2024-12-12 20:37:48 +0100
  • 1aaf34056f Merge from main Christoph Auer 2024-12-12 20:17:24 +0100
  • f57884a30f Merge from main Christoph Auer 2024-12-12 20:17:24 +0100
  • ccab2db1d4 Update pinnings to docling-core Christoph Auer 2024-12-12 20:15:15 +0100
  • c02af42759 Update pinnings to docling-core Christoph Auer 2024-12-12 20:15:15 +0100
  • d1d0ddd924 chore: bump version to 2.11.0 [skip ci] github-actions[bot] 2024-12-12 08:16:05 +0000
  • 365a1e7b98 chore: bump version to 2.11.0 [skip ci] v2.11.0 github-actions[bot] 2024-12-12 08:16:05 +0000
  • 57d51ede04 Many layout processing improvements, add document index type Christoph Auer 2024-12-11 17:08:35 +0100
  • 55b195ca1d Many layout processing improvements, add document index type Christoph Auer 2024-12-11 17:08:35 +0100
  • f407f68716 feat: Add timeout limit to document parsing job. DS4SD#270 (#552) Abhishek Kumar 2024-12-11 19:36:10 +0530
  • 3da166eafa
    feat: Add timeout limit to document parsing job. DS4SD#270 (#552) Abhishek Kumar 2024-12-11 19:36:10 +0530
  • d094c4990a Repin to release package versions Christoph Auer 2024-12-11 13:16:35 +0100
  • 48db8a5c15 Repin to release package versions Christoph Auer 2024-12-11 13:16:35 +0100
  • 038791a25f Rebase from main Christoph Auer 2024-12-11 12:30:45 +0100
  • 5a82f2b51e Rebase from main Christoph Auer 2024-12-11 12:30:45 +0100
  • 443c28557c fix: Do not import python modules from deepsearch-glm (#569) Christoph Auer 2024-12-11 12:29:06 +0100
  • aee9c0b324
    fix: Do not import python modules from deepsearch-glm (#569) Christoph Auer 2024-12-11 12:29:06 +0100
  • 2e146c64a6 fix: Do not import python modules from deepsearch-glm Christoph Auer 2024-12-11 10:55:38 +0100
  • 05c8cb0fba Update HF model ref, reset test generate Christoph Auer 2024-12-10 20:02:19 +0100
  • f4512d0e97 Update HF model ref, reset test generate Christoph Auer 2024-12-10 20:02:19 +0100
  • 1de42bef6a Update tests Christoph Auer 2024-12-10 16:47:58 +0100
  • c8b59151d7 Update tests Christoph Auer 2024-12-10 16:47:58 +0100
  • 5e013294f9 Update lockfile Christoph Auer 2024-12-10 16:42:57 +0100
  • bd30b46356 Update lockfile Christoph Auer 2024-12-10 16:42:57 +0100
  • 76a6b13a92 Rebase from main Christoph Auer 2024-12-10 16:32:48 +0100
  • 586abd58ec Rebase from main Christoph Auer 2024-12-10 16:32:48 +0100
  • b66fb830c9 Merge pull request #556 from DS4SD/cau/layout-processing-improvement Christoph Auer 2024-12-10 16:29:07 +0100
  • cd579fd28e
    Merge pull request #556 from DS4SD/cau/layout-processing-improvement Christoph Auer 2024-12-10 16:29:07 +0100
  • 184eed4095 Merge pull request #514 from DS4SD/nli/performance Christoph Auer 2024-12-10 16:26:27 +0100
  • e282bfd8c8
    Merge pull request #514 from DS4SD/nli/performance Christoph Auer 2024-12-10 16:26:27 +0100
  • 861e6fa90c fix: Handle no result from RapidOcr reader (#558) Christoph Auer 2024-12-10 16:25:05 +0100
  • f45499ce93
    fix: Handle no result from RapidOcr reader (#558) Christoph Auer 2024-12-10 16:25:05 +0100
  • 5c69081453 fix: Ocr AccleratorDevice Nikos Livathinos 2024-12-10 15:23:56 +0000
  • f46fd9c0a6 fix: Ocr AccleratorDevice Nikos Livathinos 2024-12-10 15:23:56 +0000
  • 4aecf689aa Rebase from main Christoph Auer 2024-12-10 16:21:21 +0100
  • 6bc1bd2ec4 fix: Correct the way to set GPU for EasyOCR, RapidOCR Nikos Livathinos 2024-12-10 15:05:00 +0000
  • 94caee3fb5 fix: Correct the way to set GPU for EasyOCR, RapidOCR Nikos Livathinos 2024-12-10 15:05:00 +0000
  • e8884fa2d8
    fix: Handle no result from RapidOcr reader Christoph Auer 2024-12-10 16:05:14 +0100
  • 6f986d26e1 docs: update chunking usage docs, minor reorg (#550) Panos Vagenas 2024-12-10 16:03:02 +0100
  • d0c9e8e508
    docs: update chunking usage docs, minor reorg (#550) Panos Vagenas 2024-12-10 16:03:02 +0100
  • 5497ec8a66 Merge branch 'nli/performance' of github.com:DS4SD/docling into cau/layout-processing-improvement Christoph Auer 2024-12-10 15:57:39 +0100
  • 4447d22c2f Fixes for layout processing and tableformer workaround Christoph Auer 2024-12-10 15:50:18 +0100
  • 99ccb69a47 fix: Do proper check to set the device in EasyOCR, RapidOCR. Nikos Livathinos 2024-12-10 14:46:21 +0000
  • accb7b4481 fix: Do proper check to set the device in EasyOCR, RapidOCR. Nikos Livathinos 2024-12-10 14:46:21 +0000
  • 191be3bf9c update CLI page pointer Panos Vagenas 2024-12-10 15:20:54 +0100
  • c024275f24 (feat): Create a XML backend for PubMed documents based on the pubmed_parser library (merge conflicts) lucas-morin 2024-12-10 13:35:29 +0100
  • 4db16aa82b chore: bump version to 2.10.0 [skip ci] github-actions[bot] 2024-12-09 16:28:46 +0000
  • c5b7c2f510 fix: Call into docling-core for legacy document transform (#551) Christoph Auer 2024-12-09 17:06:47 +0100
  • 5c4f84a4bf fix: Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) Nikos Livathinos 2024-12-09 15:57:37 +0100
  • eff7970002 feat: docling-parse v2 as default PDF backend (#549) Christoph Auer 2024-12-09 13:26:17 +0100
  • df4c9fd27b chore: bump version to 2.9.0 [skip ci] github-actions[bot] 2024-12-09 09:33:55 +0000
  • 8e6f7c2305 feat: expose new hybrid chunker, update docs (#384) Panos Vagenas 2024-12-09 08:28:29 +0100
  • b15d71ba6f fix: Correcting DefaultText ID for MS Word backend (#537) Maxim Lysak 2024-12-06 15:48:35 +0100
  • 27c9476e52 feat(MS Word backend): Make detection of headers and other styles localization agnostic (#534) Maxim Lysak 2024-12-06 15:17:56 +0100
  • b3515f89ce ci: allow ! in conventionalcommits (#533) Michele Dolfi 2024-12-06 14:50:10 +0100
  • db580b4959 fix: Add py.typed marker file (#531) Sander Maijers 2024-12-06 13:42:14 +0100
  • b2a430a833 docs: document new integrations (#532) Panos Vagenas 2024-12-06 13:18:14 +0100
  • 3e91514d2e fix: Enable HTML export in CLI and add options for image mode (#513) Peter W. J. Staar 2024-12-06 12:37:57 +0100
  • 38a3e8decf fix: Missing text in docx (t tag) when embedded in a table (#528) Maxim Lysak 2024-12-06 12:37:25 +0100
  • 592179630d fix: restore pydantic version pin after fixes (#512) Michele Dolfi 2024-12-06 09:33:39 +0100
  • 228c3d107e fix: folder input in cli (#511) Michele Dolfi 2024-12-04 14:22:00 +0100
  • 6fc1710cb8 chore: bump version to 2.8.3 [skip ci] github-actions[bot] 2024-12-03 15:16:47 +0000
  • 319a7efe16 fix: improve handling of disallowed formats (#429) Christoph Auer 2024-12-03 12:45:32 +0100
  • 90c708c89e chore: bump version to 2.8.2 [skip ci] github-actions[bot] 2024-12-03 10:47:29 +0000
  • b6b9817429 chore: update numpy lock (#500) Michele Dolfi 2024-12-03 11:21:31 +0100
  • d1244a5c31 fix: ParserError EOF inside string (#470) (#472) guglie 2024-12-03 11:21:18 +0100
  • 756005e271 docs: add styling for faq (#502) Michele Dolfi 2024-12-03 11:20:49 +0100
  • b80b35c7c9 perf: prevent temp file leftovers, reuse core type (#487) Panos Vagenas 2024-12-03 10:40:28 +0100
  • 2f4d38f4da fix: PermissionError when using tesseract_ocr_cli_model (#496) Gaspard Petit 2024-12-03 04:22:03 -0500
  • 7c195829f3 docs: typo in faq (#484) Álvaro Huertas 2024-12-02 10:35:24 +0100
  • 6a8ad8a3eb docs: add automatic api reference (#475) Michele Dolfi 2024-12-02 09:55:52 +0100
  • 5aec43397d docs: introduce faq section (#468) Michele Dolfi 2024-11-29 22:34:56 +0100
  • b52fce3f27 chore: bump version to 2.8.1 [skip ci] github-actions[bot] 2024-11-29 13:04:48 +0000
  • d4c5d9a893 fix(cli): expose debug options (#467) Michele Dolfi 2024-11-29 13:25:58 +0100
  • 76e6d93ce2 fix: remove unused deps (#466) Michele Dolfi 2024-11-29 13:18:06 +0100
  • 71231790dc feat: Create XML backend for PubMed documents and resolve conflicts lucas-morin 2024-12-05 13:18:22 +0100
  • 1a3daf2ffb fix: make enum serializable with human-readable value (#555) Michele Dolfi 2024-12-10 13:12:44 +0100
  • a7df337654
    fix: make enum serializable with human-readable value (#555) Michele Dolfi 2024-12-10 13:12:44 +0100