Commit Graph

  • 5a44236ac2 chore: bump version to 2.5.2 [skip ci] v2.5.2 github-actions[bot] 2024-11-13 08:19:09 +0000
  • c9341bf22e
    fix: skip glm model downloads (#322) Michele Dolfi 2024-11-13 08:45:28 +0100
  • 9558134a1a fix: skip glm model downloads Michele Dolfi 2024-11-13 07:46:52 +0100
  • 2c0c439a44 chore: bump version to 2.5.1 [skip ci] v2.5.1 github-actions[bot] 2024-11-12 14:56:34 +0000
  • 5de8d02ca1 chore: fix Qdrant notebook Colab link Panos Vagenas 2024-11-12 15:27:01 +0100
  • fb8ba861e2
    fix: Handling of single-cell tables in DOCX backend (#314) Maxim Lysak 2024-11-12 15:20:55 +0100
  • 7f5d35ea3c
    docs: Hybrid RAG with Qdrant (#312) Anush 2024-11-12 19:48:14 +0530
  • 31457ba623 Added example of trickly 1 cell table docx Maksym Lysak 2024-11-12 14:49:31 +0100
  • 6f30048f3a fix token counting bug, minor revamping Panos Vagenas 2024-11-12 14:39:22 +0100
  • b46ae1af56 proceed processing the content of single cell table as if its just part of the body Maksym Lysak 2024-11-12 14:26:10 +0100
  • 6b47080083
    Merge branch 'main' into qdrant Anush 2024-11-12 18:24:31 +0530
  • 64f5b02628
    chore: review updates Anush008 2024-11-12 18:24:02 +0530
  • 93fc1be61a
    docs: add Data Prep Kit integration (#316) Panos Vagenas 2024-11-12 12:21:48 +0100
  • 777237ebc9 chore: bump version to 2.5.0 [skip ci] v2.5.0 github-actions[bot] 2024-11-12 10:19:55 +0000
  • 7575392102
    chore: pre-commit again Anush008 2024-11-12 15:43:28 +0530
  • 32055fe9d6 docs: add Data Prep Kit integration Panos Vagenas 2024-11-12 11:09:40 +0100
  • 5d4a10b121
    fix: Configure env prefix for docling settings (#315) Christoph Auer 2024-11-12 10:57:16 +0100
  • e5e4a3cdea Configure env prefix for docling settings Christoph Auer 2024-11-12 10:32:16 +0100
  • f7b58dfa51 cleaned Maksym Lysak 2024-11-12 09:51:48 +0100
  • d8b4c07173 returned try-catch on tables handling Maksym Lysak 2024-11-12 09:50:15 +0100
  • c6b3763ecb
    feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) Nikos Livathinos 2024-11-12 09:46:14 +0100
  • 9569214afb Handling of single-cell tables in DOCX backend Maksym Lysak 2024-11-12 09:43:37 +0100
  • 0d35e98785
    docs: Hybrid RAG with Qdrant Anush008 2024-11-12 11:09:56 +0530
  • e1cb823cab minor typing fix Panos Vagenas 2024-11-11 19:00:23 +0100
  • 088ce5f696 Merge branch 'main' into force_ocr Nikos Livathinos 2024-11-11 17:47:02 +0100
  • 7a0f16079d fix: Move common OCR code in the BaseOcrModel class Nikos Livathinos 2024-11-11 17:19:35 +0100
  • 23d1a080ce consolidate advanced chunker notebook Panos Vagenas 2024-11-11 17:06:53 +0100
  • 81c8243a8b
    fix: Added handling of grouped elements in pptx backend (#307) Maxim Lysak 2024-11-11 16:38:21 +0100
  • 4567b09d9f updated log.warn to warning Maksym Lysak 2024-11-11 15:22:48 +0100
  • 53bf2d1790
    Added handling of code blocks in html with <pre> tag (#302) Maxim Lysak 2024-11-11 15:00:11 +0100
  • 4051c0bcfc Added handling of grouped elements in pptx backend Maksym Lysak 2024-11-11 14:49:20 +0100
  • 1239ade275
    docs: add navigation indices (#305) Panos Vagenas 2024-11-11 14:49:06 +0100
  • 7234dc3a42 feat: Introduce the force-ocr cmd parameter in docling cli. Add the full_page_ocr.py example in mkdocs Nikos Livathinos 2024-11-11 14:10:56 +0100
  • dfe52e6c34 docs: add navigation indices Panos Vagenas 2024-11-11 14:04:51 +0100
  • f4885c6324 Added handling of code blocks in html with <pre> tag Maksym Lysak 2024-11-11 13:24:20 +0100
  • 1963e7145b chore(examples): Add example how to force OCR Nikos Livathinos 2024-11-10 16:09:53 +0100
  • dea1d91ebe feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning and uses the recognized OCR cells instead of any existing PDF cells. Update unit tests. Nikos Livathinos 2024-11-10 14:26:44 +0100
  • 97f214efdd
    fix: allow mps usage for easyocr (#286) Michele Dolfi 2024-11-10 14:26:17 +0100
  • 1a569275b5 comment out example Michele Dolfi 2024-11-09 09:15:35 +0100
  • c6ae14861f add example for cpu-only Michele Dolfi 2024-11-09 09:14:13 +0100
  • 5f07114fd9 fix: allow mps usage for easyocr Michele Dolfi 2024-11-09 09:06:29 +0100
  • be8aa17291 chore: bump version to 2.4.2 [skip ci] v2.4.2 github-actions[bot] 2024-11-08 16:31:47 +0000
  • 0eb065e9b6
    fix(EasyOcrModel): Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr (#282) Nikos Livathinos 2024-11-08 16:48:41 +0100
  • 02394455b9 fix(EasyOcrModel): Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr without GPU if MPS is available. Nikos Livathinos 2024-11-08 14:31:48 +0100
  • 118f162e64 chore: bump version to 2.4.1 [skip ci] v2.4.1 github-actions[bot] 2024-11-08 12:37:36 +0000
  • 704d792a79
    fix(tesserocr): Raise Exception if tesserocr has not loaded any languages (#279) Nikos Livathinos 2024-11-08 13:03:09 +0100
  • 9e54a74410 another fix to the tests Peter Staar 2024-11-08 12:48:53 +0100
  • 89c9ca3823 fix(TesseractOcrModel): Use different error messages when tesserocr is not properly installed and when the TESSDATA_PREFIX envvar is not properly configured. Nikos Livathinos 2024-11-08 11:28:20 +0100
  • 944988cb30 Fix linting issues, update CLI docs, and add error for ocrmac use on non-Mac systems Suhwan Seo 2024-11-08 19:05:33 +0900
  • e5593641e9 fix(tesserocr): Raise Exception if tesserocr has not loaded any languages Nikos Livathinos 2024-11-08 10:09:08 +0100
  • 719cfe93c3 updated the poetry lock Suhwan Seo 2024-11-08 15:07:19 +0900
  • 4aaf128384 feat: add support for ocrmac OCR engine on macOS NuRi 2024-11-08 09:08:55 +0900
  • 311640fb9d reformatted the code Peter Staar 2024-11-08 05:41:09 +0100
  • f112064eaa feat: add support for ocrmac OCR engine on macOS NuRi 2024-11-08 09:08:55 +0900
  • 5c82ff9890 fixed the tests Peter Staar 2024-11-07 05:15:13 +0100
  • e1cba8a825 draft for picture description models Michele Dolfi 2024-11-06 11:38:12 +0100
  • b154d4f2d7 updated ground-truth Peter Staar 2024-11-06 10:55:18 +0100
  • 0a5817a36e updated the html tests (2) Peter Staar 2024-11-06 05:46:09 +0100
  • c7b9792d6b updated the html tests Peter Staar 2024-11-06 05:44:50 +0100
  • 6c22cba0a7
    chore: add issue templates (#251) Panos Vagenas 2024-11-05 23:18:20 +0100
  • 5b0dd76727 chore: add issue templates Panos Vagenas 2024-11-05 22:15:48 +0100
  • c3098e3c12
    chore: fix typo (#241) Ikko Eltociear Ashimine 2024-11-06 00:20:04 +0900
  • a84ec276b0
    docs: update badges & credits (#248) Panos Vagenas 2024-11-05 13:57:06 +0100
  • 45685a715e docs: update badges & credits Panos Vagenas 2024-11-05 13:20:06 +0100
  • 90836db90a
    fix: Dockerfile example copy command (#234) Anthony R 2024-11-05 12:48:27 +0100
  • 5ce02c5c59
    docs: add coming-soon section (#235) Panos Vagenas 2024-11-05 08:53:02 +0100
  • d5e65aedac
    docs: add artifacts-path param to CLI (#233) Panos Vagenas 2024-11-05 08:51:21 +0100
  • 98efb8957e reformatted the code to pass the tests Peter Staar 2024-11-05 07:29:45 +0100
  • ddd1474c8d reformatted the code Peter Staar 2024-11-05 07:25:21 +0100
  • 3257034631 replace new lines and double spaces in list-items with single spaces Peter Staar 2024-11-05 07:24:31 +0100
  • 91eca31e38
    chore: update docling_parse_v2_backend.py Ikko Eltociear Ashimine 2024-11-05 15:04:05 +0900
  • ad56f8ad8c
    chore: update docling_parse_backend.py Ikko Eltociear Ashimine 2024-11-05 15:02:44 +0900
  • dfa0013078
    chore: update pypdfium2_backend.py Ikko Eltociear Ashimine 2024-11-05 15:01:14 +0900
  • f276c0cc90 updated the html backend to add svg, remove empty list-items and use data-content fields Peter Staar 2024-11-05 06:37:43 +0100
  • 4cab7d63db
    Removed the use of doc.name Bill Murdock 2024-11-04 17:00:02 -0500
  • 805d50d643 docs: add coming-soon section Panos Vagenas 2024-11-04 19:56:50 +0100
  • f59c60310d fix: Dockerfile example copy Anthony R 2024-11-04 19:49:24 +0100
  • 2b16e0cd90
    docs: add artifacts-path param to CLI Panos Vagenas 2024-11-04 19:32:14 +0100
  • e30a9c25a2 chore: bump version to 2.4.0 [skip ci] v2.4.0 github-actions[bot] 2024-11-04 15:11:09 +0000
  • 862d78d271
    chore: update pyproject.toml metadata (#229) Panos Vagenas 2024-11-04 15:48:00 +0100
  • a058a2a43d
    chore: update pyproject.toml metadata Panos Vagenas 2024-11-04 15:31:28 +0100
  • eeee3b4371
    docs: add explicit artifacts path example (#224) Panos Vagenas 2024-11-04 14:27:56 +0100
  • 5f5fea90a9
    docs: update custom convert and dockerfile (#226) Michele Dolfi 2024-11-04 14:27:40 +0100
  • 41acaa9e2e
    docs: correct spelling of 'individual' (#219) Vicky Sekhon 2024-11-04 08:27:02 -0500
  • 40ad987303
    feat: pdf backend, table mode as options and artifacts path (#203) Michele Dolfi 2024-11-04 14:26:05 +0100
  • af323c04ef
    fit: Specify encoding when writing output file (#214) Johnny Salazar 2024-11-04 20:24:13 +0700
  • 8fb445f46c
    chore: make tests lighter (#228) Panos Vagenas 2024-11-04 14:02:28 +0100
  • 0f6c98feea exclude more examples from CI Panos Vagenas 2024-11-04 12:38:19 +0100
  • e9222b656e chore: make tests lighter Panos Vagenas 2024-11-04 10:46:52 +0100
  • ada26aa9b8 docs: update example Dockerfile Michele Dolfi 2024-11-04 09:41:00 +0100
  • e1865fb519 docs: remove old code from custom_convert.py Michele Dolfi 2024-11-04 09:39:28 +0100
  • d662a787d1
    touch to trigger needed checks Panos Vagenas 2024-11-04 09:34:50 +0100
  • e60bdb570c expose artifacts-path as argument Michele Dolfi 2024-11-04 09:12:54 +0100
  • 2d9c1a721a
    minor docs fix Panos Vagenas 2024-11-04 09:09:49 +0100
  • aa187552e5
    docs: add explicit artifacts path example Panos Vagenas 2024-11-04 09:08:16 +0100
  • 36cb20bf9a fix(docs): correct spelling of 'individual' Vicky Sekhon 2024-11-03 12:50:12 -0500
  • fba2d14645
    Merge a500b391bf into 244ca69cfd Vicky Sekhon 2024-11-03 12:45:08 -0500
  • a500b391bf fix(docs): correct spelling of 'individual' Vicky Sekhon 2024-11-03 12:44:55 -0500
  • bf2eb4a14d
    Merge a81ef2a374 into 244ca69cfd Vicky Sekhon 2024-11-03 12:39:35 -0500
  • a81ef2a374 Merge branch 'fix/incorrect-spelling-of-individual-in-docs' of https://github.com/VickySekhon/docling into fix/incorrect-spelling-of-individual-in-docs Vicky Sekhon 2024-11-03 12:35:04 -0500