Commit Graph

  • 959f91180f Merge branch 'release_v3' into cau/layout-processing-improvement Christoph Auer 2024-12-09 16:53:27 +0100
  • ce82e23b66 Merge branch 'release_v3' into nli/performance Christoph Auer 2024-12-09 16:52:54 +0100
  • bb1774dd6b Merge branch 'release_v3' into nli/performance Christoph Auer 2024-12-09 16:52:54 +0100
  • d006b937ad Rebase from main Christoph Auer 2024-12-09 16:52:26 +0100
  • 9e99e242dc Rebase from main Christoph Auer 2024-12-09 16:52:26 +0100
  • 4862153a7c Call into docling-core for legacy document transform Christoph Auer 2024-12-09 16:43:40 +0100
  • 30fa21d863 Futher layout tuning Christoph Auer 2024-12-09 16:25:19 +0100
  • a06ee134dc Merge remote-tracking branch 'origin/main' into dev/xml-backend lucas-morin 2024-12-09 16:24:49 +0100
  • dd214b2b6e fix conflicts lucas-morin 2024-12-09 16:24:20 +0100
  • c21ada4b22 fix: Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) Nikos Livathinos 2024-12-09 15:57:37 +0100
  • 78f61a8522
    fix: Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) Nikos Livathinos 2024-12-09 15:57:37 +0100
  • 3240955db9 Create a XML backend for PubMed documents based on the pubmed_parser library lucas-morin 2024-12-09 15:50:10 +0100
  • cbf56ace09 Merge branch 'main' into nli/fix_ocr_options Nikos Livathinos 2024-12-09 14:27:25 +0100
  • bb83bc3f8d fix: Use the HF API to disable the tqdm progress bars Nikos Livathinos 2024-12-09 14:25:18 +0100
  • fbb28b851d Updated test ground-truth (again), bugfix for empty layout Christoph Auer 2024-12-09 13:50:04 +0100
  • 46ae215b68 Updated test ground-truth (again), bugfix for empty layout Christoph Auer 2024-12-09 13:50:04 +0100
  • 840f5e15ed feat: docling-parse v2 as default PDF backend (#549) Christoph Auer 2024-12-09 13:26:17 +0100
  • aca57f0527
    feat: docling-parse v2 as default PDF backend (#549) Christoph Auer 2024-12-09 13:26:17 +0100
  • 731e48ea43 Updated test ground-truth Christoph Auer 2024-12-09 13:19:38 +0100
  • 03f8690c62 Updated test ground-truth Christoph Auer 2024-12-09 13:19:38 +0100
  • 8323997737 Fix DP2 backend code, change CLI default backend Christoph Auer 2024-12-09 12:48:30 +0100
  • 3b1e1707bb Update lock Christoph Auer 2024-12-09 12:20:37 +0100
  • de54bef966 Upgrade to ds-glm 1.0 and docling-parse 3.0 Christoph Auer 2024-12-09 11:45:16 +0100
  • c7a02e9b2b Move to_docling_document from ds-glm to this repo Christoph Auer 2024-12-04 13:11:41 +0100
  • 1149d3ae08 fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model Nikos Livathinos 2024-12-09 11:12:28 +0100
  • 5d5d14d00c fix: TableStructureModel: Refactor the artifacts path to use the new structure for fast/accurate model Nikos Livathinos 2024-12-09 11:12:28 +0100
  • 3f7c452678 docs: update chunking usage docs, minor reorg Panos Vagenas 2024-12-09 10:50:15 +0100
  • d15d656c39 chore: bump version to 2.9.0 [skip ci] github-actions[bot] 2024-12-09 09:33:55 +0000
  • 9fd2cf847a chore: bump version to 2.9.0 [skip ci] v2.9.0 github-actions[bot] 2024-12-09 09:33:55 +0000
  • 48d2cb3505 feat: expose new hybrid chunker, update docs (#384) Panos Vagenas 2024-12-09 08:28:29 +0100
  • c8ecdd987e
    feat: expose new hybrid chunker, update docs (#384) Panos Vagenas 2024-12-09 08:28:29 +0100
  • 04977aac9f fix: Code styling Nikos Livathinos 2024-12-08 22:14:48 +0100
  • 64c7382880 fix: Silence the tqdm messages during the downloading of model files Nikos Livathinos 2024-12-08 18:35:06 +0100
  • e125b9b24d fix: main: Introduce format options for Image with the same pdf pipeline_options. Add RapidOcrOptions to the Union of ocr_options for PdfPipelineOptions Nikos Livathinos 2024-12-08 18:32:08 +0100
  • 16e6a3884f feat: expose new hybrid chunker, update docs Panos Vagenas 2024-12-06 18:48:09 +0100
  • dc71b8c004 fix: Correcting DefaultText ID for MS Word backend (#537) Maxim Lysak 2024-12-06 15:48:35 +0100
  • eb7ffcdd1c
    fix: Correcting DefaultText ID for MS Word backend (#537) Maxim Lysak 2024-12-06 15:48:35 +0100
  • 3bdebf32f7 Correcting DefaultText ID for MS Word backend Maksym Lysak 2024-12-06 15:28:19 +0100
  • c31d9f032e feat(MS Word backend): Make detection of headers and other styles localization agnostic (#534) Maxim Lysak 2024-12-06 15:17:56 +0100
  • 3e073dfbeb
    feat(MS Word backend): Make detection of headers and other styles localization agnostic (#534) Maxim Lysak 2024-12-06 15:17:56 +0100
  • f63e5ef3b5 fix: Improve the pydantic objects in the pipeline_options and imports. Nikos Livathinos 2024-12-06 14:56:35 +0100
  • 975fe076f4 fix: Improve the pydantic objects in the pipeline_options and imports. Nikos Livathinos 2024-12-06 14:56:35 +0100
  • a38f57efce ci: allow ! in conventionalcommits (#533) Michele Dolfi 2024-12-06 14:50:10 +0100
  • 53039a8367
    ci: allow ! in conventionalcommits (#533) Michele Dolfi 2024-12-06 14:50:10 +0100
  • 09a28a9f1e Using style id instead of style names, which should be localization agnostic Maksym Lysak 2024-12-06 11:19:51 +0100
  • 279eefbf19 ci: allow ! in conventionalcommits Michele Dolfi 2024-12-06 13:46:27 +0100
  • ba32fb8637 fix: Add py.typed marker file (#531) Sander Maijers 2024-12-06 13:42:14 +0100
  • 9102fe1adc
    fix: Add py.typed marker file (#531) Sander Maijers 2024-12-06 13:42:14 +0100
  • eb02a3235f merged with main dev/update-html-parser-with-h1 Peter Staar 2024-12-06 13:23:53 +0100
  • 6f7b128867 docs: document new integrations (#532) Panos Vagenas 2024-12-06 13:18:14 +0100
  • e780333440
    docs: document new integrations (#532) Panos Vagenas 2024-12-06 13:18:14 +0100
  • 74d2617883 feat: add py.typed marker file Sander Maijers 2024-12-06 12:51:34 +0100
  • f2661cdffb docs: document new integrations Panos Vagenas 2024-12-06 12:42:33 +0100
  • 54b4daa2dd fix: Enable HTML export in CLI and add options for image mode (#513) Peter W. J. Staar 2024-12-06 12:37:57 +0100
  • 0d11e30dd8
    fix: Enable HTML export in CLI and add options for image mode (#513) Peter W. J. Staar 2024-12-06 12:37:57 +0100
  • 63f1125d5c fix: Missing text in docx (t tag) when embedded in a table (#528) Maxim Lysak 2024-12-06 12:37:25 +0100
  • b730b2d7a0
    fix: Missing text in docx (t tag) when embedded in a table (#528) Maxim Lysak 2024-12-06 12:37:25 +0100
  • 71f3a7ac3c Rebase from release_v3 Christoph Auer 2024-12-06 12:33:38 +0100
  • 6f0b91287c Rebase from release_v3 Christoph Auer 2024-12-06 12:33:38 +0100
  • b0da1a2127 Merge pull request #504 from DS4SD/cau/layout-postprocessing Christoph Auer 2024-12-06 12:26:34 +0100
  • 40d7a8e293
    Merge pull request #504 from DS4SD/cau/layout-postprocessing Christoph Auer 2024-12-06 12:26:34 +0100
  • 5c6df3e4fe Resolve lock conflicts with main Christoph Auer 2024-12-06 11:27:25 +0100
  • eb72188262 Pin docling-core>=2.7.1 Christoph Auer 2024-12-06 11:20:55 +0100
  • b81961a04e Clean up styling and docs Christoph Auer 2024-12-06 11:11:09 +0100
  • 3915d3f35c reference is now working Peter Staar 2024-12-06 10:49:44 +0100
  • 0769cb03bf cleaning up the comments Peter Staar 2024-12-06 10:45:06 +0100
  • d6c314d7f1 removed the duck emoji, added the in the cli. Currently, the referenced seems broken Peter Staar 2024-12-06 10:13:53 +0100
  • 6eccbae5fc Fix for missing text in docx (t tag) when embedded in a table Maksym Lysak 2024-12-06 09:49:05 +0100
  • bed92b766f fix: restore pydantic version pin after fixes (#512) Michele Dolfi 2024-12-06 09:33:39 +0100
  • c830b92b2e
    fix: restore pydantic version pin after fixes (#512) Michele Dolfi 2024-12-06 09:33:39 +0100
  • 624b392e79 fix: Skip NavigableString in HTML parsing higuhigu-lb 2024-12-03 11:44:16 +0900
  • 7867014d0b Create a XML backend for PubMed documents based on the pubmed_parser library lucas-morin 2024-12-05 13:20:00 +0100
  • 6c818d0926 Create a XML backend for PubMed documents based on the pubmed_parser library lucas-morin 2024-12-05 13:18:22 +0100
  • 3bb7df66ca feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI. - Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run. - Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting. - Refactor the way how the docling-ibm-models are called to match the new init signature of models. - Translate the accelerator options to the specific inputs for third-party models. - Extend the docling CLI with parameters to set the num_threads and device. - Add new unit tests. - Write new example how to use the accelerator options. Nikos Livathinos 2024-12-02 18:27:44 +0100
  • ddb8ad9227 feat(Accelerator): Introduce options to control the num_threads and device from API, envvars, CLI. - Introduce the AcceleratorOptions, AcceleratorDevice and use them to set the device where the models run. - Introduce the accelerator_utils with function to decide the device and resolve the AUTO setting. - Refactor the way how the docling-ibm-models are called to match the new init signature of models. - Translate the accelerator options to the specific inputs for third-party models. - Extend the docling CLI with parameters to set the num_threads and device. - Add new unit tests. - Write new example how to use the accelerator options. Nikos Livathinos 2024-12-02 18:27:44 +0100
  • ef87fc40f0 pin docling-core release Michele Dolfi 2024-12-04 16:29:33 +0100
  • 89487dd76e reformatted the code Peter Staar 2024-12-04 16:24:27 +0100
  • b0fc1e7189 added html to cli Peter Staar 2024-12-04 16:20:17 +0100
  • a062ab1937 updated the cli to export html Peter Staar 2024-12-04 16:10:43 +0100
  • 9f5e512080 updated the index.md Peter Staar 2024-12-04 16:04:37 +0100
  • 478bb8f7a5 removed duck in title Peter Staar 2024-12-04 15:50:09 +0100
  • 299c685d27 updated README Peter Staar 2024-12-04 15:49:33 +0100
  • 84f3548d30 Clean up imports again Christoph Auer 2024-12-04 15:22:43 +0100
  • 8b04edd177 Clean up imports again Christoph Auer 2024-12-04 15:22:43 +0100
  • e36f7d82f6 fix: folder input in cli (#511) Michele Dolfi 2024-12-04 14:22:00 +0100
  • 8ada0bccc7
    fix: folder input in cli (#511) Michele Dolfi 2024-12-04 14:22:00 +0100
  • e97688cd3d Merge branch 'release_v3' of github.com:DS4SD/docling into cau/layout-postprocessing Christoph Auer 2024-12-04 14:21:09 +0100
  • e8266425ac Merge branch 'release_v3' of github.com:DS4SD/docling into cau/layout-postprocessing Christoph Auer 2024-12-04 14:21:09 +0100
  • 4887fb776d test: pin new docling-core changes and release pydantic pinning Michele Dolfi 2024-12-04 13:40:42 +0100
  • c3acc69048 fix: folder input in cli Michele Dolfi 2024-12-04 13:24:37 +0100
  • 11c7c43bad Move to_docling_document from ds-glm to this repo Christoph Auer 2024-12-04 13:11:41 +0100
  • a1ac0c66ef Move to_docling_document from ds-glm to this repo Christoph Auer 2024-12-04 13:11:41 +0100
  • 78fad801fe chore: bump version to 2.8.3 [skip ci] github-actions[bot] 2024-12-03 15:16:47 +0000
  • 9c788ae778 chore: bump version to 2.8.3 [skip ci] v2.8.3 github-actions[bot] 2024-12-03 15:16:47 +0000
  • 1c14a2ac56
    Merge branch 'DS4SD:main' into simonas/base-options Simonas Jakubonis 2024-12-03 16:25:35 +0200
  • 0240ae2930 Pass nested clusters through GLM as payload Christoph Auer 2024-12-03 13:58:27 +0100
  • 65fa584a1a Pass nested clusters through GLM as payload Christoph Auer 2024-12-03 13:58:27 +0100
  • 4dcc738b6d Pass nested cluster processing through full pipeline Christoph Auer 2024-12-03 13:08:45 +0100
  • db70916f57 Pass nested cluster processing through full pipeline Christoph Auer 2024-12-03 13:08:45 +0100
  • 0be736227f fix: improve handling of disallowed formats (#429) Christoph Auer 2024-12-03 12:45:32 +0100