Commit Graph

  • 2d24faecd9 docs: add integrations, revamp docs (#693) Panos Vagenas 2025-01-07 14:15:54 +01:00
  • d49650c54f fix(mspowerpoint): handle invalid images in PowerPoint slides (#650) Jinfeng Sun 2025-01-07 20:58:10 +08:00
  • 0ee849e8bc feat: added http header support for document converter and cli (#642) Luke Harrison 2025-01-07 04:15:14 -05:00
  • 569038df42 docs: Add OpenContracts as an integration (#679) JSIV 2025-01-07 04:14:42 -05:00
  • 2b591f9872 docs: add Weaviate RAG recipe notebook (#451) m-newhauser 2024-12-19 14:57:40 -06:00
  • fc645ea531 docs: document Haystack & Vectara support (#628) Panos Vagenas 2024-12-19 13:33:02 +01:00
  • 1418fa1488 chore: bump version to 2.14.0 [skip ci] v2.14.0 github-actions[bot] 2024-12-18 07:04:47 +00:00
  • fd034802b6 feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) Lucas Morin 2024-12-17 19:27:09 +01:00
  • e31f09f71f chore: bump version to 2.13.0 [skip ci] v2.13.0 github-actions[bot] 2024-12-17 17:01:04 +00:00
  • 60dc852f16 feat: Updated Layout processing with forms and key-value areas (#530) Christoph Auer 2024-12-17 17:32:24 +01:00
  • 00dec7a2f3 test: generate file from CLI in a temporary directory (#618) Cesar Berrospi Ramis 2024-12-17 16:35:42 +01:00
  • 4e087504cc feat: create a backend to parse USPTO patents into DoclingDocument (#606) Cesar Berrospi Ramis 2024-12-17 16:35:23 +01:00
  • 3e599c7bbe docs: add Haystack RAG example (#615) Panos Vagenas 2024-12-17 14:24:40 +01:00
  • b7f94183f1 Merge branch 'main' of github.com:DS4SD/docling into release_v3 cau/new-layout-processing Christoph Auer 2024-12-17 14:07:58 +01:00
  • ec554cb4f2 Adjust confidence in EasyOcr Christoph Auer 2024-12-17 13:45:59 +01:00
  • 3b53bd38c8 feat: Add Easyocr parameter recog_network (#613) itsainii 2024-12-17 16:47:18 +08:00
  • 1f5b1d46ab feat: Add Easyocr parameter recog_network (#613) itsainii 2024-12-17 16:47:18 +08:00
  • 3bb3bf5715 docs: Fix the path to the run_with_accelerator.py example (#608) Nikos Livathinos 2024-12-16 15:03:06 +01:00
  • cf2606825a docs: Fix the path to the run_with_accelerator.py example (#608) Nikos Livathinos 2024-12-16 15:03:06 +01:00
  • 0fd50e53be Fix form and key value area groups Christoph Auer 2024-12-16 15:01:27 +01:00
  • efc25225ac Introduce OCR confidence, propagate to orphan in post-processing Christoph Auer 2024-12-16 14:42:01 +01:00
  • c020f2cba3 Rebase from main Christoph Auer 2024-12-16 11:26:24 +01:00
  • a2db5fbd0f chore: bump version to 2.12.0 [skip ci] v2.12.0 github-actions[bot] 2024-12-13 18:27:00 +00:00
  • 31184ad516 chore: bump version to 2.12.0 [skip ci] github-actions[bot] 2024-12-13 18:27:00 +00:00
  • 19fad9261c feat: Introduce support for GPU Accelerators (#593) Nikos Livathinos 2024-12-13 17:45:22 +01:00
  • 16bd38cbf4 feat: Introduce support for GPU Accelerators (#593) Nikos Livathinos 2024-12-13 17:45:22 +01:00
  • 8cb7d8327a Fixes for cluster pre-ordering Christoph Auer 2024-12-13 14:17:21 +01:00
  • d972a29f2a Fix table box snapping Christoph Auer 2024-12-13 08:44:22 +01:00
  • 12ccf20ddc Update test GT Christoph Auer 2024-12-12 20:37:48 +01:00
  • 1aaf34056f Merge from main Christoph Auer 2024-12-12 20:17:24 +01:00
  • ccab2db1d4 Update pinnings to docling-core Christoph Auer 2024-12-12 20:15:15 +01:00
  • 365a1e7b98 chore: bump version to 2.11.0 [skip ci] v2.11.0 github-actions[bot] 2024-12-12 08:16:05 +00:00
  • d1d0ddd924 chore: bump version to 2.11.0 [skip ci] github-actions[bot] 2024-12-12 08:16:05 +00:00
  • 57d51ede04 Many layout processing improvements, add document index type Christoph Auer 2024-12-11 17:08:35 +01:00
  • 3da166eafa feat: Add timeout limit to document parsing job. DS4SD#270 (#552) Abhishek Kumar 2024-12-11 19:36:10 +05:30
  • f407f68716 feat: Add timeout limit to document parsing job. DS4SD#270 (#552) Abhishek Kumar 2024-12-11 19:36:10 +05:30
  • d094c4990a Repin to release package versions Christoph Auer 2024-12-11 13:16:35 +01:00
  • 038791a25f Rebase from main Christoph Auer 2024-12-11 12:30:45 +01:00
  • aee9c0b324 fix: Do not import python modules from deepsearch-glm (#569) Christoph Auer 2024-12-11 12:29:06 +01:00
  • 443c28557c fix: Do not import python modules from deepsearch-glm (#569) Christoph Auer 2024-12-11 12:29:06 +01:00
  • 05c8cb0fba Update HF model ref, reset test generate Christoph Auer 2024-12-10 20:02:19 +01:00
  • 1de42bef6a Update tests Christoph Auer 2024-12-10 16:47:58 +01:00
  • 5e013294f9 Update lockfile Christoph Auer 2024-12-10 16:42:57 +01:00
  • 76a6b13a92 Rebase from main Christoph Auer 2024-12-10 16:32:48 +01:00
  • b66fb830c9 Merge pull request #556 from DS4SD/cau/layout-processing-improvement Christoph Auer 2024-12-10 16:29:07 +01:00
  • 184eed4095 Merge pull request #514 from DS4SD/nli/performance Christoph Auer 2024-12-10 16:26:27 +01:00
  • f45499ce93 fix: Handle no result from RapidOcr reader (#558) Christoph Auer 2024-12-10 16:25:05 +01:00
  • 861e6fa90c fix: Handle no result from RapidOcr reader (#558) Christoph Auer 2024-12-10 16:25:05 +01:00
  • 5c69081453 fix: Ocr AccleratorDevice Nikos Livathinos 2024-12-10 15:23:56 +00:00
  • 6bc1bd2ec4 fix: Correct the way to set GPU for EasyOCR, RapidOCR Nikos Livathinos 2024-12-10 15:05:00 +00:00
  • d0c9e8e508 docs: update chunking usage docs, minor reorg (#550) Panos Vagenas 2024-12-10 16:03:02 +01:00
  • 6f986d26e1 docs: update chunking usage docs, minor reorg (#550) Panos Vagenas 2024-12-10 16:03:02 +01:00
  • 99ccb69a47 fix: Do proper check to set the device in EasyOCR, RapidOCR. Nikos Livathinos 2024-12-10 14:46:21 +00:00
  • a7df337654 fix: make enum serializable with human-readable value (#555) Michele Dolfi 2024-12-10 13:12:44 +01:00
  • 1a3daf2ffb fix: make enum serializable with human-readable value (#555) Michele Dolfi 2024-12-10 13:12:44 +01:00
  • eb30c4f763 chore: bump version to 2.10.0 [skip ci] v2.10.0 github-actions[bot] 2024-12-09 16:28:46 +00:00
  • ca83a1f0c9 chore: bump version to 2.10.0 [skip ci] github-actions[bot] 2024-12-09 16:28:46 +00:00
  • 7972d47f88 fix: Call into docling-core for legacy document transform (#551) Christoph Auer 2024-12-09 17:06:47 +01:00
  • 440c16ff20 fix: Call into docling-core for legacy document transform (#551) Christoph Auer 2024-12-09 17:06:47 +01:00
  • ce82e23b66 Merge branch 'release_v3' into nli/performance Christoph Auer 2024-12-09 16:52:54 +01:00
  • d006b937ad Rebase from main Christoph Auer 2024-12-09 16:52:26 +01:00
  • 78f61a8522 fix: Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) Nikos Livathinos 2024-12-09 15:57:37 +01:00
  • c21ada4b22 fix: Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) Nikos Livathinos 2024-12-09 15:57:37 +01:00
  • fbb28b851d Updated test ground-truth (again), bugfix for empty layout Christoph Auer 2024-12-09 13:50:04 +01:00
  • aca57f0527 feat: docling-parse v2 as default PDF backend (#549) Christoph Auer 2024-12-09 13:26:17 +01:00
  • 840f5e15ed feat: docling-parse v2 as default PDF backend (#549) Christoph Auer 2024-12-09 13:26:17 +01:00
  • 731e48ea43 Updated test ground-truth Christoph Auer 2024-12-09 13:19:38 +01:00
  • 1149d3ae08 fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model Nikos Livathinos 2024-12-09 11:12:28 +01:00
  • 9fd2cf847a chore: bump version to 2.9.0 [skip ci] v2.9.0 github-actions[bot] 2024-12-09 09:33:55 +00:00
  • d15d656c39 chore: bump version to 2.9.0 [skip ci] github-actions[bot] 2024-12-09 09:33:55 +00:00
  • c8ecdd987e feat: expose new hybrid chunker, update docs (#384) Panos Vagenas 2024-12-09 08:28:29 +01:00
  • 48d2cb3505 feat: expose new hybrid chunker, update docs (#384) Panos Vagenas 2024-12-09 08:28:29 +01:00
  • eb7ffcdd1c fix: Correcting DefaultText ID for MS Word backend (#537) Maxim Lysak 2024-12-06 15:48:35 +01:00
  • dc71b8c004 fix: Correcting DefaultText ID for MS Word backend (#537) Maxim Lysak 2024-12-06 15:48:35 +01:00
  • 3e073dfbeb feat(MS Word backend): Make detection of headers and other styles localization agnostic (#534) Maxim Lysak 2024-12-06 15:17:56 +01:00
  • c31d9f032e feat(MS Word backend): Make detection of headers and other styles localization agnostic (#534) Maxim Lysak 2024-12-06 15:17:56 +01:00
  • f63e5ef3b5 fix: Improve the pydantic objects in the pipeline_options and imports. Nikos Livathinos 2024-12-06 14:56:35 +01:00
  • 53039a8367 ci: allow ! in conventionalcommits (#533) Michele Dolfi 2024-12-06 14:50:10 +01:00
  • a38f57efce ci: allow ! in conventionalcommits (#533) Michele Dolfi 2024-12-06 14:50:10 +01:00
  • 9102fe1adc fix: Add py.typed marker file (#531) Sander Maijers 2024-12-06 13:42:14 +01:00
  • ba32fb8637 fix: Add py.typed marker file (#531) Sander Maijers 2024-12-06 13:42:14 +01:00
  • eb02a3235f merged with main dev/update-html-parser-with-h1 Peter Staar 2024-12-06 13:23:53 +01:00
  • 6f7b128867 docs: document new integrations (#532) Panos Vagenas 2024-12-06 13:18:14 +01:00
  • e780333440 docs: document new integrations (#532) Panos Vagenas 2024-12-06 13:18:14 +01:00
  • 54b4daa2dd fix: Enable HTML export in CLI and add options for image mode (#513) Peter W. J. Staar 2024-12-06 12:37:57 +01:00
  • 0d11e30dd8 fix: Enable HTML export in CLI and add options for image mode (#513) Peter W. J. Staar 2024-12-06 12:37:57 +01:00
  • 63f1125d5c fix: Missing text in docx (t tag) when embedded in a table (#528) Maxim Lysak 2024-12-06 12:37:25 +01:00
  • b730b2d7a0 fix: Missing text in docx (t tag) when embedded in a table (#528) Maxim Lysak 2024-12-06 12:37:25 +01:00
  • 71f3a7ac3c Rebase from release_v3 Christoph Auer 2024-12-06 12:33:38 +01:00
  • b0da1a2127 Merge pull request #504 from DS4SD/cau/layout-postprocessing Christoph Auer 2024-12-06 12:26:34 +01:00
  • bed92b766f fix: restore pydantic version pin after fixes (#512) Michele Dolfi 2024-12-06 09:33:39 +01:00
  • c830b92b2e fix: restore pydantic version pin after fixes (#512) Michele Dolfi 2024-12-06 09:33:39 +01:00
  • 3bb7df66ca feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI. - Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run. - Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting. - Refactor the way how the docling-ibm-models are called to match the new init signature of models. - Translate the accelerator options to the specific inputs for third-party models. - Extend the docling CLI with parameters to set the num_threads and device. - Add new unit tests. - Write new example how to use the accelerator options. Nikos Livathinos 2024-12-02 18:27:44 +01:00
  • 84f3548d30 Clean up imports again Christoph Auer 2024-12-04 15:22:43 +01:00
  • e36f7d82f6 fix: folder input in cli (#511) Michele Dolfi 2024-12-04 14:22:00 +01:00
  • 8ada0bccc7 fix: folder input in cli (#511) Michele Dolfi 2024-12-04 14:22:00 +01:00
  • e97688cd3d Merge branch 'release_v3' of github.com:DS4SD/docling into cau/layout-postprocessing Christoph Auer 2024-12-04 14:21:09 +01:00
  • 11c7c43bad Move to_docling_document from ds-glm to this repo Christoph Auer 2024-12-04 13:11:41 +01:00
  • 9c788ae778 chore: bump version to 2.8.3 [skip ci] v2.8.3 github-actions[bot] 2024-12-03 15:16:47 +00:00
  • 78fad801fe chore: bump version to 2.8.3 [skip ci] github-actions[bot] 2024-12-03 15:16:47 +00:00