Commit Graph

  • 97aef61667
    Specify encoding when writing output file to avoid errors when default target encoding doesn't have all characters Johnny Salazar 2024-11-03 18:21:19 +0700
  • 8902d1e208 update CLI docs Michele Dolfi 2024-11-02 09:44:34 +0100
  • af32a049d4 feat: add more options in the CLI Michele Dolfi 2024-11-02 09:41:33 +0100
  • 5fc4d5bd3d work-in-progress: dealing with in attributes of html elements Peter Staar 2024-11-02 09:27:07 +0100
  • 244ca69cfd
    docs: update LlamaIndex docs (#196) Panos Vagenas 2024-11-01 20:55:28 +0100
  • 53d8732224
    docs: update LlamaIndex docs Panos Vagenas 2024-11-01 20:21:46 +0100
  • 7ed4d371c5
    Update advanced_chunking_with_merging Bill Murdock 2024-11-01 13:22:54 -0400
  • bab286fde2 reuse existing chunk/meta types, fix minor issues, lint Panos Vagenas 2024-11-01 15:50:21 +0100
  • e41b41306a
    Add files via upload Bill Murdock 2024-11-01 08:41:37 -0400
  • 473ad9a032 add the skip_furniture parameter Peter Staar 2024-11-01 11:32:56 +0100
  • 626311bf73 docs: add advanced chunking example Panos Vagenas 2024-11-01 09:38:37 +0100
  • ebe0b203c8 added the detection of h1 and the skip_furniture parameter Peter Staar 2024-10-31 16:06:41 +0100
  • c52e68c52b feat: add ability to detect h1 and filter from there-on Peter Staar 2024-10-31 15:50:26 +0100
  • 9d8865856d chore: bump version to 2.3.1 [skip ci] v2.3.1 github-actions[bot] 2024-10-30 18:23:53 +0000
  • eb679ccbb4
    fix: simplify torch dependencies and update pinned docling deps (#190) Michele Dolfi 2024-10-30 18:44:08 +0100
  • 65803aee1b update docling-ibm-models Michele Dolfi 2024-10-30 18:04:45 +0100
  • 904d24d600
    fix: allow to explicitly initialize the pipeline (#189) Michele Dolfi 2024-10-30 17:54:53 +0100
  • 461fe76417 fix: simplify torch dependencies and update pinned docling deps Michele Dolfi 2024-10-30 17:49:00 +0100
  • e5b177e9e6 clean examples Michele Dolfi 2024-10-30 17:14:48 +0100
  • 4dfa7e636f feat: allow to explicitly initialize the pipeline Michele Dolfi 2024-10-30 16:54:55 +0100
  • 43349865d0 chore: bump version to 2.3.0 [skip ci] v2.3.0 github-actions[bot] 2024-10-30 14:47:37 +0000
  • 2a2c65bf4f
    feat: Add pipeline timings and toggle visualization, establish debug settings (#183) Christoph Auer 2024-10-30 15:04:19 +0100
  • 94a5290789
    chore: update the with input formats and DoclingDocument (#188) Peter W. J. Staar 2024-10-30 15:02:28 +0100
  • 704393613f Rewrite feature items on README Christoph Auer 2024-10-30 14:14:00 +0100
  • 682c2b44c4 Add start_timestamps to ProfilingItem Christoph Auer 2024-10-30 13:44:24 +0100
  • f542460af3
    fix: fix duplicate title and heading + add e2e tests for html and docx (#186) Peter W. J. Staar 2024-10-30 13:14:56 +0100
  • 14b63a3e7d
    restructure title fix (#187) Panos Vagenas 2024-10-30 10:19:58 +0100
  • f33fab4abd add bitmap images as format Michele Dolfi 2024-10-30 09:11:00 +0100
  • 2731508455 propagate changes to the docs index Michele Dolfi 2024-10-30 07:34:14 +0100
  • e3c3af3a79 move docs CI/CD to independent unit Michele Dolfi 2024-10-30 07:27:33 +0100
  • 8cd90f4708 change link to docs Michele Dolfi 2024-10-30 07:23:51 +0100
  • e282cae5f2 fix typo Michele Dolfi 2024-10-30 07:06:08 +0100
  • c03d29d632 chore: update the README to reflect all input/output format and highlight the DoclingDocument data-structure. Peter Staar 2024-10-30 05:52:43 +0100
  • fc12cda82b fixed the html tests Peter Staar 2024-10-29 17:39:47 +0100
  • e11bbc8b0b restructure title fix Panos Vagenas 2024-10-29 17:18:26 +0100
  • ddf7fd12c4 Update lockfile Christoph Auer 2024-10-29 15:54:22 +0100
  • f822844a87 Optimize imports Christoph Auer 2024-10-29 15:47:56 +0100
  • 3de3f1371c Fixes for time logging Christoph Auer 2024-10-29 15:46:20 +0100
  • 5c8e06fbde moved the ground-truth data Peter Staar 2024-10-29 15:33:03 +0100
  • 7f2e98661e updated the tests, moved the ground-truth Peter Staar 2024-10-29 15:31:42 +0100
  • e1b83ec485 Visualization codes output PNG to debug dir Christoph Auer 2024-10-29 13:53:29 +0100
  • 811d1929e1 fixed the output of the test Peter Staar 2024-10-29 13:44:10 +0100
  • 0cdccb3da1 Merge branch 'main' of github.com:DS4SD/docling into cau/pipeline-profiling Christoph Auer 2024-10-29 10:54:27 +0100
  • 79f90b46dc Refactor and fix profiling codes Christoph Auer 2024-10-29 10:54:17 +0100
  • 99cbb2fc96 fixed the examples (1) Peter Staar 2024-10-29 09:25:14 +0100
  • 70865b4c7d fix: make CLI JSON export more human-readable add-json-export-indentation Panos Vagenas 2024-10-29 08:54:41 +0100
  • ca492ef6ca fixed the tests (2) Peter Staar 2024-10-29 08:46:44 +0100
  • ef2529e50d fixed the tests Peter Staar 2024-10-29 07:38:40 +0100
  • 6163409305 reformatted the text Peter Staar 2024-10-29 06:02:49 +0100
  • 7cb7da7ce9 updated the output of itxt Peter Staar 2024-10-29 05:48:29 +0100
  • baaeb60b4a add real e2e tests for html and docx Peter Staar 2024-10-29 05:36:23 +0100
  • dda2645d4c chore: bump version to 2.2.1 [skip ci] v2.2.1 github-actions[bot] 2024-10-28 17:18:41 +0000
  • b9f5c74a7d
    fix: fix header levels for DOCX & HTML (#184) Panos Vagenas 2024-10-28 17:02:52 +0100
  • 94d0729c50
    fix: handling of long sequence of unescaped underscore chars in markdown (#173) Maxim Lysak 2024-10-28 16:34:48 +0100
  • 07c2bd18e1 fix: fix header levels for DOCX & HTML Panos Vagenas 2024-10-28 16:10:03 +0100
  • 0814f32ae4 Add profiling code to all models Christoph Auer 2024-10-28 15:04:09 +0100
  • 2cece27208
    docs: update LlamaIndex docs for Docling v2 (#182) Panos Vagenas 2024-10-28 14:28:26 +0100
  • a00f01cf07 Merge branch 'main' of github.com:DS4SD/docling into cau/vis-and-profiling-options Christoph Auer 2024-10-28 13:21:45 +0100
  • 747a190b3a Add settings to turn visualization on or off Christoph Auer 2024-10-28 13:21:32 +0100
  • e7599705a8 docs: update LlamaIndex docs for Docling v2 Panos Vagenas 2024-10-28 10:33:41 +0100
  • 189d3c2d44
    docs: fix batch convert (#177) Michele Dolfi 2024-10-26 05:50:34 +0200
  • 7d19418b77
    fix: HTML backend, fixes for Lists and nested texts (#180) Maxim Lysak 2024-10-25 20:14:04 +0200
  • 88c1673057
    fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) Maxim Lysak 2024-10-25 18:02:20 +0200
  • 28a09a5029 cleaning up Maksym Lysak 2024-10-25 17:56:42 +0200
  • 3e490c4184 removed prints Maksym Lysak 2024-10-25 17:46:44 +0200
  • 5831089fb0 Updated tests because of the change in Markdown export in docling-core Maksym Lysak 2024-10-25 17:01:29 +0200
  • 7332360e27 Fixes for HTML backend Maksym Lysak 2024-10-25 16:27:54 +0200
  • d4971e1494 Updated docling-core to 2.2.1 Maksym Lysak 2024-10-25 14:37:52 +0200
  • a5735f4fd4 Made smarter processing of headers, with arbitrary styling Maksym Lysak 2024-10-25 14:08:47 +0200
  • 162643c7f7 removed print Maksym Lysak 2024-10-25 10:48:05 +0200
  • 77a89c3334
    chore: make auto-release on request (#179) Michele Dolfi 2024-10-25 10:47:25 +0200
  • 97999ebb43 Added proper handling of headers with bold, italic or emphasis Maksym Lysak 2024-10-25 10:39:07 +0200
  • 9b5b14f1a8 making fix more rare Maksym Lysak 2024-10-25 10:10:24 +0200
  • 42c98ba2cd chore: make auto-release on request Michele Dolfi 2024-10-25 10:01:54 +0200
  • 1c933e20f8 Small fix to properly handle trailing inline text in the md backend Maksym Lysak 2024-10-25 09:44:34 +0200
  • d55603f5ba docs: fix batch convert Michele Dolfi 2024-10-25 08:39:27 +0200
  • 8d356aa247
    docs: add export with embedded images (#175) Michele Dolfi 2024-10-24 20:19:41 +0200
  • 7c2eaa4203 docs: add export with embedded images Michele Dolfi 2024-10-24 19:14:40 +0200
  • d654a292e8 Fixed trailing inline text handling (at the end of a file), and corrected underscore sequence shortening Maksym Lysak 2024-10-24 17:17:58 +0200
  • 5d090c59c4 Added comment explaining reason for fix Maksym Lysak 2024-10-24 13:28:03 +0200
  • 1783f137da Fix for md hanging when encountering long sequence of unescaped underscore chars Maksym Lysak 2024-10-24 13:09:34 +0200
  • 8208c93e3a chore: bump version to 2.2.0 [skip ci] v2.2.0 github-actions[bot] 2024-10-23 16:04:55 +0000
  • 4116819b51
    feat: Update to docling-parse v2 without history (#170) Peter W. J. Staar 2024-10-23 17:20:11 +0200
  • e6b9ad2993 repin poetry.lock Michele Dolfi 2024-10-23 16:40:23 +0200
  • 5ae3cb172d Merge from main, update lock Christoph Auer 2024-10-23 16:18:08 +0200
  • 3023f18ba0
    feat: Support AsciiDoc and Markdown input format (#168) Christoph Auer 2024-10-23 16:14:26 +0200
  • 0f33ad6418 Lock docling-parse 2.0.0 Christoph Auer 2024-10-23 16:04:34 +0200
  • 3496b4838f
    fix: set valid=false for invalid backends (#171) Michele Dolfi 2024-10-23 15:52:30 +0200
  • 76d904164e Fixed issue with group ordeering in pptx backend, added gebug log into run with formats Maksym Lysak 2024-10-23 15:33:00 +0200
  • dd25698132 Lock docling-parse 2.0.0 Christoph Auer 2024-10-23 15:01:59 +0200
  • 82126e3871 Fixed issues with duplicated paragraphs and incorrect lists in pptx Maksym Lysak 2024-10-23 15:00:47 +0200
  • 51c477cfc3 Add test case for InputDocument Christoph Auer 2024-10-23 14:46:05 +0200
  • dc473efed0 fix: set valid=false for invalid backends Michele Dolfi 2024-10-23 14:34:54 +0200
  • 9204d5e79f Update imports for docling_parse.pdf_parser_v1 Christoph Auer 2024-10-23 14:28:15 +0200
  • 0f81ffda74 Added proper processing of in-line textual elements for MD backend Maksym Lysak 2024-10-23 11:10:54 +0200
  • b8796e6705 updated the pyproject (still need to run poetry lock after docling-parse is accepted) Peter Staar 2024-10-23 06:20:07 +0200
  • e8229fdd4c cleaned prints Maksym Lysak 2024-10-22 15:48:33 +0200
  • 186d71a057 Added support for code blocks and fenced code in MD Maksym Lysak 2024-10-22 15:45:21 +0200
  • 4fb803f46c Fix styling Christoph Auer 2024-10-22 15:30:47 +0200
  • b8d2286dd1
    chore: various minor docs fixes (#169) Panos Vagenas 2024-10-22 15:29:36 +0200