Commit Graph

  • 249715a463 use different cache key in each job Michele Dolfi 2025-03-02 16:57:16 -0500
  • 9e30fca4c0 more timeout Michele Dolfi 2025-03-02 16:14:39 -0500
  • b3dbfad00a increase hf timeout Michele Dolfi 2025-03-02 15:55:59 -0500
  • 74916565de warning for develop examples Michele Dolfi 2025-03-02 15:44:55 -0500
  • 87d849792d use gh cache for huggingface models Michele Dolfi 2025-03-02 15:32:19 -0500
  • 8dc0562542
    fix: enable locks for threadsafe pdfium (#1052) Michele Dolfi 2025-03-02 20:06:44 +0100
  • e25d557c06
    refactor: add the contentlayer to html-backend (#1040) Peter W. J. Staar 2025-03-02 10:37:53 -0500
  • b3cf5d4471 Merge remote-tracking branch 'origin/main' into fix-threadsafe-pypdfium Michele Dolfi 2025-03-02 09:58:27 -0500
  • 346a49c283 fix deadlock in pypdfium2 backend Michele Dolfi 2025-03-02 09:58:06 -0500
  • 3ea31b6111 chore: set TextItem label to 'text' instead of 'paragraph' Cesar Berrospi Ramis 2025-03-01 11:27:03 -0300
  • 38d622f22c refactor(html): put parsed item in body if doc has no header Cesar Berrospi Ramis 2025-02-28 18:03:58 +0100
  • 70e6b942e1 test(html): add more info if a test case fails Cesar Berrospi Ramis 2025-02-28 16:05:43 +0100
  • 0cba30e254 reformatted code of html backend Peter Staar 2025-02-24 09:29:06 +0100
  • e5e00674e1 updated the handle_image function Peter Staar 2025-02-24 09:26:30 +0100
  • 252bd83066 added the contentlayer to html-backend Peter Staar 2025-02-23 07:01:18 +0100
  • db3ceefd4a
    docs: improve docs on token limit warning triggered by HybridChunker (#1077) Panos Vagenas 2025-02-28 14:54:46 +0100
  • 0426e4d682 docs: improve docs on token limit warning triggered by HybridChunker Panos Vagenas 2025-02-28 13:28:30 +0100
  • de7b963b09
    fix(html): use 'start' attribute when parsing ordered lists from HTML docs (#1062) Cesar Berrospi Ramis 2025-02-27 09:46:57 +0100
  • 252c185ae0 chore(html): reduce verbosity in HTML backend Cesar Berrospi Ramis 2025-02-26 15:50:12 +0100
  • 413f6e8a88 fix(html): use 'start' attribute in ordered lists Cesar Berrospi Ramis 2025-02-26 15:47:12 +0100
  • 37dd8c1cc7 chore: bump version to 2.25.0 [skip ci] v2.25.0 github-actions[bot] 2025-02-26 14:16:15 +0000
  • 3c9fe76b70
    feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) Christoph Auer 2025-02-26 14:43:26 +0100
  • 483d1bfab0 fix: add proper table provenance Christoph Auer 2025-02-26 14:02:22 +0100
  • ab683e4fb6
    feat(cli): add option for downloading all models, refine help messages (#1061) Panos Vagenas 2025-02-26 13:27:29 +0100
  • abd714b64b add --all flag to model download CLI Panos Vagenas 2025-02-26 13:00:49 +0100
  • b88440a7c9 chore: fix leftover .to(device) Christoph Auer 2025-02-26 12:53:43 +0100
  • 5cd0fdd258 chore: more cleanup Christoph Auer 2025-02-26 12:48:14 +0100
  • c5873f2496 chore: clean up code and comments Christoph Auer 2025-02-26 12:46:41 +0100
  • 34393e51a2 make options an explicit kwarg Panos Vagenas 2025-02-26 11:44:06 +0100
  • 560164f613
    chore(cli): update download help messages Panos Vagenas 2025-02-26 11:32:39 +0100
  • f994654918 Make drawing code resilient against bad bboxes Christoph Auer 2025-02-26 11:01:11 +0100
  • e197225739
    fix: vlm using artifacts path (#1057) Michele Dolfi 2025-02-26 08:33:50 +0100
  • d269c8596c Add back device_map and accelerate Christoph Auer 2025-02-25 19:19:23 +0100
  • a6d406764a Fix VLM example exclusion in CI Christoph Auer 2025-02-25 16:00:39 +0100
  • 10f64a948c Expose control over using flash_attention_2 Christoph Auer 2025-02-25 15:31:32 +0100
  • 1553a125dc switch to create methods Panos Vagenas 2025-02-25 14:43:15 +0100
  • b85caac0ec add granite vision to the download utils Michele Dolfi 2025-02-25 14:25:28 +0100
  • 96d9fe19cd fix usage of artifacts path Michele Dolfi 2025-02-25 14:09:14 +0100
  • eec896631e revert blacklist João 2025-02-25 10:05:56 -0300
  • 84c77c9fcb Move imports Christoph Auer 2025-02-25 14:03:28 +0100
  • 341806e54b Rename example Christoph Auer 2025-02-25 13:43:05 +0100
  • 1cba96ecfd Generalize and refactor VLM pipeline and models Christoph Auer 2025-02-25 13:38:44 +0100
  • c84b973959
    docs: extend chunking docs, add FAQ on token limit (#1053) Panos Vagenas 2025-02-25 13:07:38 +0100
  • ecb3895f9a docs: extend chunking docs, add FAQ on token limit Panos Vagenas 2025-02-25 12:30:26 +0100
  • 762a511d0a enable locks for threadsafe pdfium Michele Dolfi 2025-02-25 11:15:43 +0100
  • 45b8cb7060
    small changes Navanit Dubey 2025-02-25 15:00:26 +0530
  • 27fec3de6c
    Feature to use local vlm model too Navanit Dubey 2025-02-25 14:47:30 +0530
  • 1c75b52f85 re-built poetry.lock mly/smol-docling-integration Maksym Lysak 2025-02-24 17:37:35 +0100
  • 9ecec1d330 Updated poetry.lock Maksym Lysak 2025-02-24 17:27:50 +0100
  • 923f766ada Replaced remaining strings to appropriate enums Maksym Lysak 2025-02-24 16:54:59 +0100
  • a095a7c5b7 removing changes from base_pipeline Maksym Lysak 2025-02-24 15:13:59 +0100
  • a7a1f32b10 Added example on how to get original predicted doctags in minimal_smol_docling Maksym Lysak 2025-02-24 14:39:18 +0100
  • 1dbedcbb4e removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines) Maksym Lysak 2025-02-24 14:17:06 +0100
  • 0c60ef199a Moved keep_backend = True to vlm pipeline Maksym Lysak 2025-02-13 17:53:03 +0100
  • 853544ba11 Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things Maksym Lysak 2025-02-13 17:19:53 +0100
  • b0935daec4 Removed special html code wrapping when exporting to docling document, cleaned up comments Maksym Lysak 2025-02-13 10:29:37 +0100
  • b12f5ba80f removed minimal_smol_docling example from CI checks Maksym Lysak 2025-02-13 09:42:45 +0100
  • 66532eadb6 More elegant solution in removing the input prompt Maksym Lysak 2025-02-12 18:48:48 +0100
  • e486eb1720 Cleaned up unnecessary logging Maksym Lysak 2025-02-12 17:56:37 +0100
  • 55fa4eb4e3 Fix repo id Christoph Auer 2025-02-12 17:09:56 +0100
  • 6f9f4f4aee Update minimal smoldocling example Christoph Auer 2025-02-12 17:07:00 +0100
  • b1df461ca8 Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements Maksym Lysak 2025-02-11 16:42:23 +0100
  • d7abe1b1cd Updated example of Smol Docling usage Maksym Lysak 2025-02-11 13:53:19 +0100
  • 479ee239aa New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging Maksym Lysak 2025-02-11 13:34:14 +0100
  • 7c4ab5c716 Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option Maksym Lysak 2025-01-21 18:00:05 +0100
  • f2751e11f9 Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models. Maksym Lysak 2025-01-21 17:37:11 +0100
  • 88b9ac6706 Fixing doctags starting tag, that broke elements on first line during assembly Maksym Lysak 2025-01-21 11:14:55 +0100
  • 0fe12d819a Updated vlm pipeline assembly and smol docling model code to support updated doctags Maksym Lysak 2025-01-17 17:54:55 +0100
  • f6d123a01c Flipped keep_backend to True for vlm_pipeline assembly to work Maksym Lysak 2025-01-16 16:51:27 +0100
  • 9901729d8c Exposed "force_backend_text" as pipeline parameter Maksym Lysak 2025-01-16 14:23:59 +0100
  • c687757c7c
    Merge pull request #7 from 0xCarbon/sync/upstream_24022025 jpcanesin 2025-02-24 09:00:50 -0300
  • 9223023d13 Merge branch 'main' of github.com:DS4SD/docling into sync/upstream_24022025 João 2025-02-24 08:58:40 -0300
  • 0dc3ac43b1 Added capability for vlm_pipeline to grab text from preconfigured backend Maksym Lysak 2025-01-16 10:44:49 +0100
  • e0929781f4 Added tokens/sec measurement, improved example Maksym Lysak 2025-01-15 10:22:48 +0100
  • 437053572d Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum Maksym Lysak 2025-01-14 16:07:37 +0100
  • 2a43c199d5 Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models Maksym Lysak 2025-01-14 14:04:47 +0100
  • 61bb9dbba2 Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances Maksym Lysak 2025-01-13 15:21:19 +0100
  • 01c46e24b1 Fix for table span compute in vlm_pipeline Maksym Lysak 2025-01-10 16:30:12 +0100
  • ef079e4e78 Enabled figure support in vlm_pipeline Maksym Lysak 2025-01-10 13:56:46 +0100
  • 1b968e4984 Fixes to preserve page image and demo export to html Maksym Lysak 2025-01-10 10:50:35 +0100
  • 3c4c647615 WIP, first working code for inference of SmolDocling, and vlm pipeline assembly code, example included. Maksym Lysak 2025-01-09 18:41:00 +0100
  • 03c8d45790 wip smolDocling inference and vlm pipeline Maksym Lysak 2025-01-09 14:43:04 +0100
  • 3844f2a5cb fix enable option Michele Dolfi 2025-02-24 12:53:46 +0100
  • 1b0ead6907
    fix(html): Parse text in div elements as TextItem (#1041) Cesar Berrospi Ramis 2025-02-24 12:38:29 +0100
  • dc3a388aa2 Skeleton for SmolDocling model and VLM Pipeline Christoph Auer 2025-01-08 10:16:54 +0100
  • 3b9b675a4f add picture description factory Michele Dolfi 2025-02-24 11:13:14 +0100
  • 8235a246c8 Merge remote-tracking branch 'origin/main' into feat-factory-plugins Michele Dolfi 2025-02-24 08:26:05 +0100
  • 1d17e7397a
    test: avoid testing exact JSON in CSV backend (#1038) Suehtam 2025-02-24 07:10:40 +0000
  • ca5d44ffe4 feat(html): Parse text in div elements as TextItem Cesar Berrospi Ramis 2025-02-23 19:53:31 +0100
  • c075d8d765 fix: Improve markdown list parser Tobias Strebitzer 2025-02-23 13:35:37 +0800
  • 1ea502ebd8
    feat: replace verify_export with verify_document in CSV conversion tests Matheus Abdias 2025-02-23 00:50:17 +0000
  • 53451d18d6
    feat: updated verify_export Moved verify_export to verify_utils Reuse verify_export in tests Matheus Abdias 2025-02-23 00:39:42 +0000
  • d8a81c3168 chore: bump version to 2.24.0 [skip ci] v2.24.0 github-actions[bot] 2025-02-20 18:31:20 +0000
  • c93e36988f
    feat: Implement new reading-order model (#916) Christoph Auer 2025-02-20 17:51:17 +0100
  • c031a7ae47 chore: bump version to 2.23.1 [skip ci] v2.23.1 github-actions[bot] 2025-02-20 16:26:41 +0000
  • 3cdf1e5f8c chore: Delete empty file docling/models/ds_glm_model.py Nikos Livathinos 2025-02-20 17:17:30 +0100
  • 1ac010354f
    test: avoid testing exact JSON (#1027) Cesar Berrospi Ramis 2025-02-20 16:20:07 +0100
  • 6796f0a132
    fix: Runtime error when Pandas Series is not always of string type (#1024) fanszoro 2025-02-20 22:41:41 +0800
  • 50979e4304 Fix content_layer assignment Christoph Auer 2025-02-20 14:33:56 +0100
  • c686bc9908
    Update tests/test_backend_patent_uspto.py Cesar Berrospi Ramis 2025-02-20 14:23:56 +0100