Commit Graph

  • 5f5fea90a9 docs: update custom convert and dockerfile (#226) Michele Dolfi 2024-11-04 14:27:40 +01:00
  • 41acaa9e2e docs: correct spelling of 'individual' (#219) Vicky Sekhon 2024-11-04 08:27:02 -05:00
  • 40ad987303 feat: pdf backend, table mode as options and artifacts path (#203) Michele Dolfi 2024-11-04 14:26:05 +01:00
  • af323c04ef fit: Specify encoding when writing output file (#214) Johnny Salazar 2024-11-04 20:24:13 +07:00
  • 8fb445f46c chore: make tests lighter (#228) Panos Vagenas 2024-11-04 14:02:28 +01:00
  • 5fc4d5bd3d work-in-progress: dealing with in attributes of html elements Peter Staar 2024-11-02 09:27:07 +01:00
  • 244ca69cfd docs: update LlamaIndex docs (#196) Panos Vagenas 2024-11-01 20:55:28 +01:00
  • 473ad9a032 add the skip_furniture parameter Peter Staar 2024-11-01 11:32:56 +01:00
  • ebe0b203c8 added the detection of h1 and the skip_furniture parameter Peter Staar 2024-10-31 16:06:41 +01:00
  • c52e68c52b feat: add ability to detect h1 and filter from there-on Peter Staar 2024-10-31 15:50:26 +01:00
  • 9d8865856d chore: bump version to 2.3.1 [skip ci] v2.3.1 github-actions[bot] 2024-10-30 18:23:53 +00:00
  • eb679ccbb4 fix: simplify torch dependencies and update pinned docling deps (#190) Michele Dolfi 2024-10-30 18:44:08 +01:00
  • 904d24d600 fix: allow to explicitly initialize the pipeline (#189) Michele Dolfi 2024-10-30 17:54:53 +01:00
  • 43349865d0 chore: bump version to 2.3.0 [skip ci] v2.3.0 github-actions[bot] 2024-10-30 14:47:37 +00:00
  • 2a2c65bf4f feat: Add pipeline timings and toggle visualization, establish debug settings (#183) Christoph Auer 2024-10-30 15:04:19 +01:00
  • 94a5290789 chore: update the with input formats and DoclingDocument (#188) Peter W. J. Staar 2024-10-30 15:02:28 +01:00
  • f542460af3 fix: fix duplicate title and heading + add e2e tests for html and docx (#186) Peter W. J. Staar 2024-10-30 13:14:56 +01:00
  • 70865b4c7d fix: make CLI JSON export more human-readable add-json-export-indentation Panos Vagenas 2024-10-29 08:54:41 +01:00
  • dda2645d4c chore: bump version to 2.2.1 [skip ci] v2.2.1 github-actions[bot] 2024-10-28 17:18:41 +00:00
  • b9f5c74a7d fix: fix header levels for DOCX & HTML (#184) Panos Vagenas 2024-10-28 17:02:52 +01:00
  • 94d0729c50 fix: handling of long sequence of unescaped underscore chars in markdown (#173) Maxim Lysak 2024-10-28 16:34:48 +01:00
  • 2cece27208 docs: update LlamaIndex docs for Docling v2 (#182) Panos Vagenas 2024-10-28 14:28:26 +01:00
  • 189d3c2d44 docs: fix batch convert (#177) Michele Dolfi 2024-10-26 05:50:34 +02:00
  • 7d19418b77 fix: HTML backend, fixes for Lists and nested texts (#180) Maxim Lysak 2024-10-25 20:14:04 +02:00
  • 88c1673057 fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) Maxim Lysak 2024-10-25 18:02:20 +02:00
  • 77a89c3334 chore: make auto-release on request (#179) Michele Dolfi 2024-10-25 10:47:25 +02:00
  • 8d356aa247 docs: add export with embedded images (#175) Michele Dolfi 2024-10-24 20:19:41 +02:00
  • 8208c93e3a chore: bump version to 2.2.0 [skip ci] v2.2.0 github-actions[bot] 2024-10-23 16:04:55 +00:00
  • 4116819b51 feat: Update to docling-parse v2 without history (#170) Peter W. J. Staar 2024-10-23 17:20:11 +02:00
  • 3023f18ba0 feat: Support AsciiDoc and Markdown input format (#168) Christoph Auer 2024-10-23 16:14:26 +02:00
  • 3496b4838f fix: set valid=false for invalid backends (#171) Michele Dolfi 2024-10-23 15:52:30 +02:00
  • b8d2286dd1 chore: various minor docs fixes (#169) Panos Vagenas 2024-10-22 15:29:36 +02:00
  • fa5f94ec10 Fix Typo errors in CONTRIBUTING.md file (#164) Mohamed Ali 2024-10-22 10:31:48 +05:30
  • d5460e2d1f chore: bump version to 2.1.0 [skip ci] v2.1.0 github-actions[bot] 2024-10-18 13:21:15 +00:00
  • b346faf622 feat: add coverage_threshold to skip OCR for small images (#161) Michele Dolfi 2024-10-18 13:58:23 +02:00
  • f799e777c1 docs: typo fix (#155) ABHISHEK FADAKE 2024-10-18 17:26:48 +05:30
  • 63bef59d9e fix: fix legacy doc ref (#162) Panos Vagenas 2024-10-18 13:11:20 +02:00
  • bb7a58d45d ci: run ci also on forks (#160) Michele Dolfi 2024-10-18 12:32:27 +02:00
  • a00c937e19 Ensure all models work only on valid pages (#158) Christoph Auer 2024-10-18 08:54:06 +02:00
  • 034a411057 docs: add graphical band in readme (#154) Maxim Lysak 2024-10-17 18:15:40 +02:00
  • 61c092f445 docs: add use docling (#150) Michele Dolfi 2024-10-17 18:14:48 +02:00
  • 24f949ada2 chore: run apt-get update before install (#156) Michele Dolfi 2024-10-17 17:27:16 +02:00
  • a29c256041 chore: bump version to 2.0.0 [skip ci] v2.0.0 github-actions[bot] 2024-10-16 19:48:06 +00:00
  • 7d3be0edeb feat!: Docling v2 (#117) Christoph Auer 2024-10-16 21:02:03 +02:00
  • d504432c1e docs: introduce docs site (#141) Panos Vagenas 2024-10-14 14:13:13 +02:00
  • 2b1e72d327 refactor: fix type of tesseractocr options (#140) Michele Dolfi 2024-10-14 08:40:22 +02:00
  • 4672b24c1a chore: bump version to 1.20.0 [skip ci] v1.20.0 github-actions[bot] 2024-10-11 13:48:02 +00:00
  • 5e4944f15f feat: new experimental docling-parse v2 backend (#131) Christoph Auer 2024-10-11 15:12:49 +02:00
  • 2ec39636f0 chore: bump version to 1.19.1 [skip ci] v1.19.1 github-actions[bot] 2024-10-11 08:52:09 +00:00
  • dae2a3b667 fix: remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) Nikos Livathinos 2024-10-11 10:21:19 +02:00
  • 5f1bd9e9c8 docs: simplify LlamaIndex example using Docling extension (#135) Panos Vagenas 2024-10-09 22:17:56 +02:00
  • 6924999f1f chore: explicitly manage pandas dependency (#134) Panos Vagenas 2024-10-09 14:50:39 +02:00
  • 0ffc1708d2 chore: bump version to 1.19.0 [skip ci] v1.19.0 github-actions[bot] 2024-10-08 17:42:29 +00:00
  • f96ea86a00 feat: add options for choosing OCR engines (#118) Michele Dolfi 2024-10-08 19:07:08 +02:00
  • d412c363d7 fixed unload pdf backend resources (#129) Fasal Shah 2024-10-08 14:16:43 +05:30
  • 86ead45aa1 align with isort extend-metadata-in-examples Panos Vagenas 2024-10-04 15:25:52 +02:00
  • 86fd560cfd minor notebook updates Panos Vagenas 2024-10-04 14:50:38 +02:00
  • 6e16a2464e add docling splitter to LC example, simplify & align QA output Panos Vagenas 2024-10-04 14:43:27 +02:00
  • f4ee76eaec chore: showcase extended metadata in LlamaIndex example Panos Vagenas 2024-09-27 19:31:43 +02:00
  • 9b82ae3324 chore: bump version to 1.18.0 [skip ci] v1.18.0 github-actions[bot] 2024-10-03 17:16:00 +00:00
  • 2422f706a1 feat: new torch-based docling models (#120) Maxim Lysak 2024-10-03 18:42:33 +02:00
  • 9ebbbc1245 chore: bump version to 1.17.0 [skip ci] v1.17.0 github-actions[bot] 2024-10-03 13:44:52 +00:00
  • dde0aff8bd update examples (#123) Rui Dias Gomes 2024-10-03 13:28:25 +01:00
  • d44c62d7ce feat: windows support (#122) Michele Dolfi 2024-10-03 14:23:47 +02:00
  • bfdc4e32cc chore: Add test data with scanned documents and their conversions usinga EasyOCR nli/tesseract_ocr_models Nikos Livathinos 2024-10-02 13:35:38 +02:00
  • c211808742 feat: tesseract and tesserocr models. WIP. Nikos Livathinos 2024-10-02 13:30:27 +02:00
  • 455d6ff70f chore: Add tesserocr in poetry Nikos Livathinos 2024-10-02 13:27:34 +02:00
  • bbfc0617f2 feat: add options for choosing OCR engine Michele Dolfi 2024-10-02 10:47:20 +02:00
  • cde671cf34 chore: bump version to 1.16.1 [skip ci] v1.16.1 github-actions[bot] 2024-09-27 14:36:40 +00:00
  • 34bd887a7f fix: allow usage of opencv 4.6.x (#110) Michele Dolfi 2024-09-27 15:51:43 +02:00
  • c05b692d69 docs: document chunking (#111) Panos Vagenas 2024-09-27 11:16:04 +02:00
  • 6760571fe1 chore: bump version to 1.16.0 [skip ci] v1.16.0 github-actions[bot] 2024-09-27 06:21:15 +00:00
  • d6df76f90b feat: Support tableformer model choice (#90) Christoph Auer 2024-09-26 21:37:08 +02:00
  • 39977b5631 chore: move examples extras to respective group (#103) Panos Vagenas 2024-09-25 15:47:48 +02:00
  • 3dfd02a7e9 chore: bump version to 1.15.0 [skip ci] v1.15.0 github-actions[bot] 2024-09-24 15:58:16 +00:00
  • 6a03c208ec feat: add figure in markdown (#98) Michele Dolfi 2024-09-24 17:28:23 +02:00
  • 001d214a13 chore: bump version to 1.14.0 [skip ci] v1.14.0 github-actions[bot] 2024-09-24 13:38:23 +00:00
  • d96b96c848 fix: fix OCR setting for pypdfium, minor refactor (#102) Panos Vagenas 2024-09-24 14:36:00 +02:00
  • f8f2303348 docs: document CLI, minor README revamp (#100) Panos Vagenas 2024-09-24 09:21:28 +02:00
  • f555815343 chore: add RAG notebook titles (#101) Panos Vagenas 2024-09-24 09:17:46 +02:00
  • 3c46e4266c feat: add URL support to CLI (#99) Panos Vagenas 2024-09-24 08:47:53 +02:00
  • c65a01c9b7 chore: bump version to 1.13.1 [skip ci] v1.13.1 github-actions[bot] 2024-09-23 19:04:01 +00:00
  • 4794ce460a fix: updated the render_as_doctags with the new arguments from docling-core (#93) Peter W. J. Staar 2024-09-23 20:12:18 +02:00
  • dce9934a0f Updated to new, clean vector logo, svg and rendered png are provided (#96) Maxim Lysak 2024-09-23 15:31:21 +02:00
  • 1f4b224ab6 chore: switch to gh apps user (#92) Michele Dolfi 2024-09-20 17:02:27 +02:00
  • 6dd1e91c4a chore: bump version to 1.13.0 [skip ci] v1.13.0 github-actions[bot] 2024-09-18 09:26:03 +00:00
  • 0da7519896 docs: updated Docling logo.png with transparent background (#88) Maxim Lysak 2024-09-18 10:39:11 +02:00
  • f19bd43798 feat: add table exports (#86) Michele Dolfi 2024-09-18 08:44:13 +02:00
  • 442443a102 fix: bumped the glm version and adjusted the tests (#83) Peter W. J. Staar 2024-09-18 07:43:49 +02:00
  • 8242bce4fa chore: bump version to 1.12.2 [skip ci] v1.12.2 github-actions[bot] 2024-09-17 16:01:34 +00:00
  • fa9699fa3c fix(tests): Adjust the test data to match the new version of LayoutPredictor (#82) Nikos Livathinos 2024-09-17 15:50:35 +02:00
  • 30a0ef69b4 chore: Add PR template (#81) Michele Dolfi 2024-09-16 18:36:26 +02:00
  • f1932fd8c5 chore: bump version to 1.12.1 [skip ci] v1.12.1 github-actions[bot] 2024-09-16 10:58:09 +00:00
  • 2870fdc857 fix: CLI compatibility with python 3.10 and 3.11 (#79) Michele Dolfi 2024-09-16 12:32:45 +02:00
  • 34b2772a2e chore: bump version to 1.12.0 [skip ci] v1.12.0 github-actions[bot] 2024-09-13 12:34:15 +00:00
  • 98990784df feat: add docling cli (#75) Peter W. J. Staar 2024-09-13 14:03:09 +02:00
  • 8aa476ccd3 test: improve typing definitions (part 1) (#72) Michele Dolfi 2024-09-12 15:56:29 +02:00
  • 4090f9700b add node parser, JSONPath resolution to LI example, refactor demo Panos Vagenas 2024-09-12 09:20:48 +02:00
  • d8ddb559fa docs: add conversion example Panos Vagenas 2024-09-12 06:39:10 +02:00
  • 53569a1023 docs: showcase RAG with LlamaIndex and LangChain (#71) Panos Vagenas 2024-09-11 15:07:08 +02:00