Commit Graph

  • 5e351e9d86 added the reading-order model dev/add-reading-order-model Peter Staar 2025-01-27 05:52:11 +0100
  • c2ae1cc4ca
    docs: description of supported formats and backends (#788) Cesar Berrospi Ramis 2025-01-26 08:10:33 +0100
  • 3be2fb581f
    feat: Introduce automatic language detection in TesseractOcrCliModel (#800) Nikos Livathinos 2025-01-26 08:07:56 +0100
  • dafb6af849
    part 3 heading Farzad Sunavala 2025-01-25 10:58:52 -0600
  • b9ae624f78
    minor renames Farzad Sunavala 2025-01-25 10:55:48 -0600
  • 21cc7c4451
    docs: added markdown headings to enable TOC in github pages Farzad Sunavala 2025-01-25 10:52:55 -0600
  • 9e4ca90db1 chore: bump version to 2.16.0 [skip ci] v2.16.0 github-actions[bot] 2025-01-24 18:21:14 +0000
  • a458e298ca
    fix: added extraction of byte-images in excel (#804) Peter W. J. Staar 2025-01-24 18:48:02 +0100
  • 476c025207 Merge remote-tracking branch 'origin/main' into bugfix/ignore_mypy_checking_in_find_images_in_sheet Michele Dolfi 2025-01-24 18:09:11 +0100
  • 306e83e0fe fix: Refactor the TesseractOcrModel and TesseractOcrCliModel to validate if the auto-detected language is installed in the system and if not fall back to a default option without language. Nikos Livathinos 2025-01-24 17:08:01 +0000
  • 16a218d871
    feat: New document picture classifier (#805) Matteo 2025-01-24 18:05:51 +0100
  • 88a0e66adc
    feat: add Docling JSON ingestion (#783) Panos Vagenas 2025-01-24 18:05:23 +0100
  • 4df08bec32 docs: add notebook example with XML backends Cesar Berrospi Ramis 2025-01-22 18:06:07 +0100
  • 550dbe1854 docs: add documentation on supported formats and backends Cesar Berrospi Ramis 2025-01-08 16:13:47 +0100
  • a3b8414622 test: remove unnecessary imports Cesar Berrospi Ramis 2025-01-08 16:22:36 +0100
  • 82168f946c chore: remove type-ignore marks for attaching text to non GroupItems Cesar Berrospi Ramis 2024-12-17 16:05:17 +0100
  • 745615ca12 merged with main Peter Staar 2025-01-24 17:41:44 +0100
  • 8a4d59f744 pinned pillow in pyproject Peter Staar 2025-01-24 17:37:51 +0100
  • ae23504712 tests Matteo Omenetti 2025-01-24 11:37:37 -0500
  • 72e232b112 gt for e2e tests Matteo Omenetti 2025-01-24 11:37:00 -0500
  • 8ecb810bb5 figure classifier Matteo Omenetti 2025-01-24 11:35:44 -0500
  • e9768ae6a5
    chore: expose draw_clusters function (#803) Yusik Kim 2025-01-24 17:35:29 +0100
  • 00b8e1fa9b
    Update docling/backend/json/docling_json_backend.py Panos Vagenas 2025-01-24 17:26:51 +0100
  • 9cdb176a8e update conversion as per review comments, add tests, revert Docling JSON disambiguation, document intricacies Panos Vagenas 2025-01-24 16:58:24 +0100
  • cffbc457af feat: expose draw_clusters function Yusik Kim 2025-01-23 16:52:59 +0100
  • 3213b247ad
    feat: Code and equation model for PDF and code blocks in markdown (#752) Matteo 2025-01-24 16:54:22 +0100
  • d1b24c27e5 reformatted the code Peter Staar 2025-01-24 16:32:51 +0100
  • c58f75d0f7
    docs: fix minor typos (#801) Farzad Sunavala 2025-01-24 09:27:05 -0600
  • 60c2a860c4 fixed some issues Peter Staar 2025-01-24 16:25:10 +0100
  • 9afd8369eb move expansion_factor to base class Michele Dolfi 2025-01-24 16:23:44 +0100
  • a476d15c2c move imports Michele Dolfi 2025-01-24 16:01:36 +0100
  • 6c3d31d68b fix artifacts_path type Michele Dolfi 2025-01-24 16:00:22 +0100
  • 2c35b57e34 Merge branch 'mao1/code_equation_model' of https://github.com/DS4SD/docling into mao1/code_equation_model Matteo Omenetti 2025-01-24 08:56:32 -0500
  • 2af4bc160e fixed doc comment of __call__ function of code_formula_model Matteo Omenetti 2025-01-24 08:54:48 -0500
  • adcd053934
    Update docling/pipeline/standard_pdf_pipeline.py Matteo 2025-01-24 14:24:03 +0100
  • d737bb9416 gt for new pdf Matteo Omenetti 2025-01-24 08:19:33 -0500
  • c5ddcc3535
    Update rag_azuresearch.ipynb Farzad Sunavala 2025-01-24 07:09:53 -0600
  • 9020a934be
    docs: add Azure RAG example (#675) Farzad Sunavala 2025-01-24 06:56:26 -0600
  • 04137f267a fix env var sourcing, remove unused imports Panos Vagenas 2025-01-24 13:26:55 +0100
  • 044c9aa1bc docs: add Azure RAG example Panos Vagenas 2025-01-24 13:00:33 +0100
  • cdb57e0ba3 docs: Add example how to use "auto" language with tesseract OCR engines Nikos Livathinos 2025-01-24 12:38:03 +0100
  • 4c2552efc5 feat: Introduce automatic language detection in tesseract_ocr_cli model. Extend unit tests. Nikos Livathinos 2025-01-24 11:20:21 +0100
  • 784eafbed5 remove print Rafael Teixeira de Lima 2025-01-23 17:34:12 +0100
  • 0349aebb52 Remove py2 flag Rafael Teixeira de Lima 2025-01-23 17:30:47 +0100
  • 9904bd25a1 v0 of latex export of equations from docx files Rafael Teixeira de Lima 2025-01-23 17:25:43 +0100
  • d2f9f050ce Expose rec_keys_path in RapidOcrOptions to support custom dictionaries Yorick Terweijden 2025-01-22 15:38:28 +0200
  • 849aa759c7 removed print statements Matteo Omenetti 2025-01-23 07:38:39 -0500
  • 8543c22687
    feat: add "auto" language for TesseractOcr (#759) Pavel Denisov 2025-01-23 12:40:50 +0100
  • 1f3f4be3f0 Fix script models prefix for Linux Pavel Denisov 2025-01-23 11:11:34 +0100
  • ce40eb7b84 Finalize script readers Pavel Denisov 2025-01-23 10:59:11 +0100
  • a59c03b27f removed unused import Matteo Omenetti 2025-01-23 04:34:44 -0500
  • 6206687e8b added if statement for backend Matteo Omenetti 2025-01-23 04:34:00 -0500
  • eed50d46e3 Modify "auto" language in TesseractOcr to initialize the script readers lazily Pavel Denisov 2025-01-23 10:03:53 +0100
  • 570a1a560a Add tesseract-ocr-script-latn installation for the "auto" language Pavel Denisov 2025-01-22 21:46:56 +0100
  • d5b2c07295 use new add_code in backends and update typing in MD backend Michele Dolfi 2025-01-21 18:19:49 +0100
  • e707747863 pin docling-core Michele Dolfi 2025-01-21 17:43:33 +0100
  • 9fc2e5371b update docling-core pinning Michele Dolfi 2025-01-21 16:38:43 +0100
  • ecc715d506 Merge remote-tracking branch 'origin/main' into mao1/code_equation_model Michele Dolfi 2025-01-21 16:07:36 +0100
  • 958dfbed12 pin latest docling-core Michele Dolfi 2025-01-21 16:04:00 +0100
  • bfccc6ee66 chore: update lockfile Christoph Auer 2025-01-15 12:30:25 +0100
  • 7dbeba035f removed unused files Matteo Omenetti 2025-01-15 04:10:54 -0500
  • a4914a914d Rebased branch on latest main. changes for CodeItem Matteo Omenetti 2025-01-14 09:19:10 -0500
  • 6048f8ac14 propagated changes for new CodeItem class Matteo Omenetti 2025-01-14 08:20:43 -0500
  • e972c9c60c feat: add Docling JSON ingestion Panos Vagenas 2025-01-21 14:59:13 +0100
  • 55c7e8f137 fix: Add missing GT files Christoph Auer 2025-01-20 12:40:17 +0100
  • a577da1ffc fix: Test case re-generation only on CPU Christoph Auer 2025-01-20 12:26:29 +0100
  • 1235829c38 fix: Test case re-generation Christoph Auer 2025-01-20 12:12:08 +0100
  • 224d633b7e feat: Introduce plugin support for document conversion Ayoub El Bouchtili 2025-01-18 15:49:51 +0100
  • 2cbc5ce521 refactor: allow the usage of backends in the enrich models and generalize the interface (#742) Michele Dolfi 2025-01-15 09:52:38 +0100
  • 15989718b7 Merge branch 'main' of github.com:DS4SD/docling into cau/picture-content-example Christoph Auer 2025-01-20 11:47:31 +0100
  • 12e3419149 test: image input as stream Michele Dolfi 2025-01-20 11:16:35 +0100
  • c49b3526fb
    docs: fix links between docs pages (#697) Michele Dolfi 2025-01-20 09:52:59 +0100
  • e4c7210133
    ci: added action to generate llms.txt (#701) Selvam Palanimalai 2025-01-20 03:52:27 -0500
  • 670a08bded
    fix: Update docling-parse-v2 backend version with new parsing fixes (#769) Christoph Auer 2025-01-20 09:00:57 +0100
  • ead118c98b Final docling-parse pinning Christoph Auer 2025-01-19 16:59:10 +0100
  • 768608351d
    docs: fix correct Accelerator pipeline options in docs/examples/custom_convert.py (#733) Iacopo Ghinassi 2025-01-19 15:55:26 +0000
  • ebf68a5792 Update lockfile again Christoph Auer 2025-01-17 17:11:06 +0100
  • 75377cdb43 apply formatting Michele Dolfi 2025-01-17 16:21:29 +0100
  • 3b5b91f119 Merge branch 'main' of github.com:DS4SD/docling into cau/docling-parse-fonts-update Christoph Auer 2025-01-17 14:40:46 +0100
  • 858f93a6d5 Add "auto" language for TesseractOcr Pavel Denisov 2025-01-16 10:31:36 +0100
  • 57fc28d3d8
    refactor: allow the usage of backends in the enrich models and generalize the interface (#742) Michele Dolfi 2025-01-15 09:52:38 +0100
  • f7e1cbf629
    docs: Example to translate documents (#739) Peter W. J. Staar 2025-01-15 06:51:15 +0100
  • 771630be29 fix PR hooks Michele Dolfi 2025-01-14 17:56:32 +0100
  • e4f8ff3980 renaming Michele Dolfi 2025-01-14 17:54:55 +0100
  • 1c970b7613 updated the mkdocs Peter Staar 2025-01-14 13:00:50 +0100
  • 12b6417f51 move logic in BaseTextImageEnrichmentModel Michele Dolfi 2025-01-14 12:53:52 +0100
  • 3611335d22 allow the usage of backends in the enrich models and generalize the interface Michele Dolfi 2025-01-14 09:46:28 +0100
  • d79e4cf40d fix get image with cropbox Michele Dolfi 2025-01-14 08:57:49 +0100
  • 5127b31083 added example to translate documents Peter Staar 2025-01-14 06:05:49 +0100
  • 5c681ba352 feat: Pass predicted page-headers and page-footers through to DoclingDocument furniture Christoph Auer 2025-01-13 19:56:01 +0100
  • 50f495534a chore: Update lockfile with docling-parse git branch Christoph Auer 2025-01-13 19:12:47 +0100
  • 188789b221
    Update custom_convert.py Iacopo Ghinassi 2025-01-13 17:05:27 +0000
  • 1976584be1 chore: bump version to 2.15.1 [skip ci] v2.15.1 github-actions[bot] 2025-01-10 10:29:32 +0000
  • 5a060f237d
    fix: Improve OCR results, stricten criteria before dropping bitmap areas (#719) Christoph Auer 2025-01-10 10:38:49 +0100
  • 47c21a5edc disabled auto file mime type detection, rely on extension João 2025-01-09 18:24:26 -0300
  • 9d9ed0716f add more file types when infering the mime type from extesion João 2025-01-09 17:49:20 -0300
  • 7a197e71ed fix: Properly care for all bitmap elements in OCR Christoph Auer 2025-01-09 19:08:59 +0100
  • 9a6b5c8c8d
    docs: add pointers to LangChain-side docs (#718) Panos Vagenas 2025-01-09 17:36:46 +0100
  • 914cc981c7 docs: add pointer to LangChain-side docs Panos Vagenas 2025-01-09 17:06:08 +0100
  • 82441ed6d2 sync with docling main João 2025-01-09 12:25:16 -0300