Commit Graph

  • f48ec7aae3
    use techreport bibtex type Michele Dolfi 2024-08-20 10:59:30 +0200
  • 4f40a84809
    docs: add technical paper ref Michele Dolfi 2024-08-20 09:57:28 +0200
  • 35e90b66d7 feat: allow computing page images on-demand and cache them Michele Dolfi 2024-08-20 09:33:57 +0200
  • e7eafbb6a5 Add redbooks to test data, small additions Christoph Auer 2024-08-15 13:16:17 +0200
  • 778e51ef18 chore: bump version to 1.4.0 [skip ci] v1.4.0 github-actions[bot] 2024-08-14 11:46:55 +0000
  • 349b0e914f
    fix: allow newer torch versions (#34) Michele Dolfi 2024-08-14 13:37:36 +0200
  • 7291aede7d fix: allow newer torch versions Michele Dolfi 2024-08-14 13:29:35 +0200
  • 90dd676422
    feat: update parser with bytesio interface and set as new default backend (#32) Michele Dolfi 2024-08-14 12:30:00 +0200
  • 98e34e8880 update DEFAULT_BACKEND Michele Dolfi 2024-08-14 11:40:13 +0200
  • 61be78a875
    Fix class re-mapping for table of contents (#33) Christoph Auer 2024-08-14 11:32:30 +0200
  • a0b2d793cc Fix class re-mapping for table of contents Christoph Auer 2024-08-14 10:53:11 +0200
  • 426e8f1047 change default backend Michele Dolfi 2024-08-14 09:35:19 +0200
  • 8a0411be67 update parser with bytesio interface Michele Dolfi 2024-08-13 16:51:44 +0200
  • dd0df9f094 chore: bump version to 1.3.0 [skip ci] v1.3.0 github-actions[bot] 2024-08-12 16:29:05 +0000
  • 63d80edca2
    feat: output page images and extracted bbox (#31) Michele Dolfi 2024-08-12 18:25:45 +0200
  • 4b9aff5fc6 add options for different page elements, improve example and flip name of assemble_options Michele Dolfi 2024-08-12 16:46:16 +0200
  • 4338dea17b Add assemble options and example saving pages and figures Michele Dolfi 2024-08-12 11:29:27 +0200
  • 0bf4a43ed5 chore: bump version to 1.2.1 [skip ci] v1.2.1 github-actions[bot] 2024-08-07 15:38:00 +0000
  • 79ef8d2f2f
    fix: update (vuln) deps (#29) Michele Dolfi 2024-08-07 17:29:36 +0200
  • c7ea74e27c fix: update (vuln) deps Michele Dolfi 2024-08-07 17:16:47 +0200
  • 794b20a50a
    fix: type of path_or_stream in PdfDocumentBackend (#28) Michele Dolfi 2024-08-07 17:20:44 +0200
  • d125b25936 fix: type of path_or_stream in PdfDocumentBackend Michele Dolfi 2024-08-07 17:09:47 +0200
  • 9550db8e64
    docs: improve examples (#27) Michele Dolfi 2024-08-07 17:16:35 +0200
  • 02f3bc7f70 docs: improve examples Michele Dolfi 2024-08-07 17:03:20 +0200
  • 20cbe7c24a chore: bump version to 1.2.0 [skip ci] v1.2.0 github-actions[bot] 2024-08-07 14:35:03 +0000
  • b8f5e38a8c
    feat: introducing docling_backend (#26) Maxim Lysak 2024-08-07 16:22:36 +0200
  • e22226d6e7 Introducing docling_backend Uses our own docling_parse to reliably get PDF cells To get page images, this backend uses pypdfium2 Maxim Lysak 2024-08-07 16:02:01 +0200
  • 62ba4aaf31 chore: bump version to 1.1.2 [skip ci] v1.1.2 github-actions[bot] 2024-07-31 12:35:59 +0000
  • d2d9543415
    fix: set page number using 1-based indexing (#22) Panos Vagenas 2024-07-31 14:28:44 +0200
  • f637fd28a0 fix: set page number using 1-based indexing Panos Vagenas 2024-07-31 14:13:25 +0200
  • e102827753 chore: bump version to 1.1.1 [skip ci] v1.1.1 github-actions[bot] 2024-07-30 12:53:54 +0000
  • f4bf3d25b9
    fix: Correct text extraction for table cells (#21) Maxim Lysak 2024-07-30 14:51:47 +0200
  • bff53d403a Completed checks Maxim Lysak 2024-07-30 14:31:40 +0200
  • cfdfda3629 - Fixes for scaling transformation for table cell bounding boxes when using do_cell_matching = False - Corrected examples/convert.py with appropriate parameter, for good quality example conversion Maxim Lysak 2024-07-30 13:43:50 +0200
  • b07c4a7a4a chore: bump version to 1.1.0 [skip ci] v1.1.0 github-actions[bot] 2024-07-26 15:01:56 +0000
  • d603137383
    feat: add simplified single-doc conversion (#20) Panos Vagenas 2024-07-26 16:55:33 +0200
  • 02cf8e576d remove cgi, download in chunks Panos Vagenas 2024-07-26 16:48:42 +0200
  • 33d5d7d787 update README Panos Vagenas 2024-07-25 22:58:05 +0200
  • b9fd50e7de remove unnecessary import Panos Vagenas 2024-07-25 18:17:13 +0200
  • e5a3bec356 feat: add simplified single-doc conversion Panos Vagenas 2024-07-25 18:09:16 +0200
  • 3eca8b8485
    refactor(pypdfium2): just forward input to PdfDocument directly (#17) mara004 2024-07-25 08:54:57 +0200
  • 6db2b350dd chore: bump version to 1.0.2 [skip ci] v1.0.2 github-actions[bot] 2024-07-24 12:18:21 +0000
  • 54b3dda141
    fix: add easyocr to main deps for valid extra (#19) Michele Dolfi 2024-07-24 14:11:26 +0200
  • 3fd075199b remove group Michele Dolfi 2024-07-24 14:03:04 +0200
  • f4d5bd4336 fix: add easyocr to main deps for valid extra Michele Dolfi 2024-07-24 12:38:09 +0200
  • 3e92f0bfba chore: bump version to 1.0.1 [skip ci] v1.0.1 github-actions[bot] 2024-07-24 09:28:47 +0000
  • b0725e0aa6
    fix: expose ocr as extra (#18) Michele Dolfi 2024-07-24 11:14:17 +0200
  • 6f0dabe26d fix: expose ocr as extra Michele Dolfi 2024-07-24 10:45:11 +0200
  • 2b86acec5d
    pypdfium2: just forward input to PdfDocument directly mara004 2024-07-23 16:42:02 +0200
  • 9f2add112f chore: bump version to 1.0.0 [skip ci] v1.0.0 github-actions[bot] 2024-07-18 15:52:38 +0000
  • 71c3a9c8cd
    feat!: v1.0.0 release (#16) Michele Dolfi 2024-07-18 17:50:14 +0200
  • 24c55de3a0 feat!: v1.0.0 release Michele Dolfi 2024-07-18 17:48:06 +0200
  • 7bc20adc16
    pin docling-ibm-models 1.1.0 with python 3.10 support (#15) Michele Dolfi 2024-07-18 17:27:48 +0200
  • 8967ec6530 pin docling-ibm-models 1.1.0 with python 3.10 support Michele Dolfi 2024-07-18 17:06:51 +0200
  • eb0b208272
    chore: switch to docling-core Markdown export (#14) Panos Vagenas 2024-07-18 16:10:05 +0200
  • 28d1c746a6
    chore: update README (#13) Panos Vagenas 2024-07-18 11:23:23 +0200
  • cbf92a6c93 chore: switch to docling-core-provided MD export Panos Vagenas 2024-07-18 11:07:34 +0200
  • 47c1311551
    chore: update README.md Panos Vagenas 2024-07-18 10:58:19 +0200
  • f09ffcc8f4 chore: bump version to 0.4.0 [skip ci] v0.4.0 github-actions[bot] 2024-07-17 14:26:50 +0000
  • e9526bb11e
    feat: Optimize table extraction quality, add configuration options (#11) Christoph Auer 2024-07-17 16:13:21 +0200
  • 3e2ede8107 chore: bump version to 0.3.1 [skip ci] v0.3.1 github-actions[bot] 2024-07-17 13:58:51 +0000
  • d1d1724537
    fix: missing type for default values (#12) Michele Dolfi 2024-07-17 15:54:43 +0200
  • 60302c312c Documentation improvements Christoph Auer 2024-07-17 15:51:28 +0200
  • f221083a14 Merge branch 'main' of github.com:DS4SD/docling into cau/optimize-table-quality Christoph Auer 2024-07-17 15:49:39 +0200
  • 2baa35c548
    docs: reflect supported Python versions, add badges (#10) Panos Vagenas 2024-07-17 15:49:26 +0200
  • 80dbe697dd Merge branch 'main' of github.com:DS4SD/docling into cau/optimize-table-quality Christoph Auer 2024-07-17 15:44:59 +0200
  • 5d2939ac1d
    minor HTML fix Panos Vagenas 2024-07-17 15:44:30 +0200
  • 2cc0a3b58f Merge branch 'main' into cau/optimize-table-quality Christoph Auer 2024-07-17 15:42:57 +0200
  • 32905ab959 Add documentation Christoph Auer 2024-07-17 15:38:16 +0200
  • 57436b1a4c fix: missing type for default values Michele Dolfi 2024-07-17 15:25:01 +0200
  • 86c2a7fc1e chore: bump version to 0.3.0 [skip ci] github-actions[bot] 2024-07-17 12:11:15 +0000
  • 5e26e8d6e3 feat: enable python 3.12 support by updating glm (#8) Michele Dolfi 2024-07-17 14:03:26 +0200
  • f3d65577a4 docs: Add setup with pypi to Readme (#7) Christoph Auer 2024-07-16 14:15:09 +0200
  • e4674852c2 chore: bump version to 0.2.0 [skip ci] github-actions[bot] 2024-07-16 11:37:14 +0000
  • 4e97a9ddfa feat: build with ci (#6) Michele Dolfi 2024-07-16 13:34:42 +0200
  • 1a7a07e931 disable docs build (#5) Michele Dolfi 2024-07-16 13:14:44 +0200
  • cf37ace24c ci: Add Github Actions (#4) Michele Dolfi 2024-07-16 13:05:04 +0200
  • a5113eb78e doc: More documentation updates (#2) Christoph Auer 2024-07-15 14:59:53 +0200
  • f652bad2d1 docs: Update links, add GH repository to metadata (#1) Christoph Auer 2024-07-15 12:43:05 +0200
  • 6c01600194 Optimizations for table extraction quality, configurable options for cell matching Christoph Auer 2024-07-17 15:21:13 +0200
  • 78b154fde7 Add repo, absolute URLs Christoph Auer 2024-07-15 12:18:20 +0200
  • 5acb7b51cf Optimizations for table extraction quality, configurable options for cell matching Christoph Auer 2024-07-17 15:21:13 +0200
  • 4df967af83 docs: reflect supported Python versions, add badges Panos Vagenas 2024-07-17 14:34:15 +0200
  • 0dfa4548d3 chore: bump version to 0.3.0 [skip ci] v0.3.0 github-actions[bot] 2024-07-17 12:11:15 +0000
  • fb72688ff7
    feat: enable python 3.12 support by updating glm (#8) Michele Dolfi 2024-07-17 14:03:26 +0200
  • 11ef79e531 enable python 3.12 in ci tests Michele Dolfi 2024-07-17 12:53:36 +0200
  • 627d3b9ba0 update deepsearch-glm for python 3.12 support Michele Dolfi 2024-07-17 12:52:42 +0200
  • 2803222ee1
    docs: Add setup with pypi to Readme (#7) Christoph Auer 2024-07-16 14:15:09 +0200
  • 0d5b53de6d
    Add setup with pypi to Readme Christoph Auer 2024-07-16 13:43:53 +0200
  • 5c88574d03 chore: bump version to 0.2.0 [skip ci] v0.2.0 github-actions[bot] 2024-07-16 11:37:14 +0000
  • b1479cf4ec
    feat: build with ci (#6) Michele Dolfi 2024-07-16 13:34:42 +0200
  • e63dfae02c feat: build with ci Michele Dolfi 2024-07-16 13:32:12 +0200
  • b4f45ce96b
    disable docs build (#5) Michele Dolfi 2024-07-16 13:14:44 +0200
  • 101e6cc7d6 disable docs build Michele Dolfi 2024-07-16 13:13:30 +0200
  • e45dc5d1a5
    ci: Add Github Actions (#4) Michele Dolfi 2024-07-16 13:05:04 +0200
  • faa4d98dc0 add semantic-release config Panos Vagenas 2024-07-16 08:52:20 +0200
  • 407c6ef843
    Update .github/actions/setup-poetry/action.yml Michele Dolfi 2024-07-16 08:18:16 +0200
  • 67a5832cae apply styling Michele Dolfi 2024-07-15 18:09:36 +0200
  • b9dc892385
    Update convert.py (#3) Christoph Auer 2024-07-15 18:02:42 +0200
  • 75332e9a66 add Github Actions Michele Dolfi 2024-07-15 18:00:23 +0200