Maksym Lysak
|
82126e3871
|
Fixed issues with duplicated paragraphs and incorrect lists in pptx
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-23 15:00:47 +02:00 |
|
Maksym Lysak
|
0f81ffda74
|
Added proper processing of in-line textual elements for MD backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-23 11:10:54 +02:00 |
|
Maksym Lysak
|
e8229fdd4c
|
cleaned prints
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-22 15:48:33 +02:00 |
|
Maksym Lysak
|
186d71a057
|
Added support for code blocks and fenced code in MD
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-22 15:45:47 +02:00 |
|
Christoph Auer
|
4fb803f46c
|
Fix styling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-22 15:30:47 +02:00 |
|
Maksym Lysak
|
47a4d314ea
|
Fixes for MD Backend, to avoid duplicated text inserts into docling doc
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-22 14:39:44 +02:00 |
|
Christoph Auer
|
578e30e23b
|
Update to docling-core v2.1.0
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-22 14:34:38 +02:00 |
|
Christoph Auer
|
b1a2af6d39
|
Update all backends with proper filename in DocumentOrigin
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-22 14:11:40 +02:00 |
|
Christoph Auer
|
789b29bb24
|
Merge ASCIIDoc and Markdown backends in, fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-22 11:34:35 +02:00 |
|
Christoph Auer
|
0bbd50f500
|
Merge branch 'dev/add-asciidocs-backend' of github.com:DS4SD/docling into cau/backend-document-origin
|
2024-10-22 11:04:49 +02:00 |
|
Peter Staar
|
bb3db07836
|
fixed the mypy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-22 09:44:27 +02:00 |
|
Peter Staar
|
b04f14ec24
|
able to parse the captions and image uri's
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-22 09:13:08 +02:00 |
|
Peter Staar
|
1c0a766cc5
|
working on asciidocs, struggling with ImageRef
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-22 07:42:42 +02:00 |
|
Maksym Lysak
|
8c60dfa0e6
|
Fixed example run_md, added origin info to md_backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 16:42:18 +02:00 |
|
Maksym Lysak
|
1456a36618
|
Fixes MyPy requirements, and rest of pre-commit
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:43:39 +02:00 |
|
Maksym Lysak
|
dae366440c
|
Cleaned code, improved logging for MD
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:47 +02:00 |
|
Maksym Lysak
|
ba9beb65e3
|
Added initial docling table support to md_backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:47 +02:00 |
|
Maksym Lysak
|
fa2f8cf236
|
Detecting and assembling tables in markdown in temporary buffers
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:47 +02:00 |
|
Maksym Lysak
|
bef429fee3
|
Improvements in md parsing
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:47 +02:00 |
|
Maksym Lysak
|
534b2203f6
|
md_backend produces docling document with headers, paragraphs, lists
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:47 +02:00 |
|
Maksym Lysak
|
1df89f79ff
|
work in progress on MD backend
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:47 +02:00 |
|
Maksym Lysak
|
5986213cfe
|
Drafting Markdown backend via Marko library
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
|
2024-10-21 15:11:42 +02:00 |
|
Peter Staar
|
c23d049270
|
adding test_02.asciidoc
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-21 06:01:56 +02:00 |
|
Peter Staar
|
e60c52586b
|
fixed the mypy
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-19 06:23:35 +02:00 |
|
Peter Staar
|
70b2ae3fab
|
reformatted the code
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-18 16:57:26 +02:00 |
|
Peter Staar
|
5016daeae3
|
first working asciidoc parser
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-18 16:51:39 +02:00 |
|
Peter Staar
|
1138cae7f1
|
adding tests for asciidocs
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-18 16:51:39 +02:00 |
|
github-actions[bot]
|
c60c402e15
|
chore: bump version to 2.1.0 [skip ci]
|
2024-10-18 16:51:39 +02:00 |
|
Michele Dolfi
|
006cfb4125
|
feat: add coverage_threshold to skip OCR for small images (#161)
* feat: add coverage_threshold to skip OCR for small images
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* filter individual boxes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename option
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-18 16:51:39 +02:00 |
|
ABHISHEK FADAKE
|
b6c061093c
|
docs: typo fix (#155)
* Docs: Typo fix
- Corrected spelling of invidual to automatic
Signed-off-by: ABHISHEK FADAKE <31249309+fadkeabhi@users.noreply.github.com>
* add synchronize event for forks
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: ABHISHEK FADAKE <31249309+fadkeabhi@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-18 16:51:39 +02:00 |
|
Panos Vagenas
|
eb154a1c28
|
fix: fix legacy doc ref (#162)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-18 16:51:39 +02:00 |
|
Michele Dolfi
|
77fa1db3a1
|
ci: run ci also on forks (#160)
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
|
2024-10-18 16:51:39 +02:00 |
|
Christoph Auer
|
63d3704e54
|
Ensure all models work only on valid pages (#158)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-18 16:51:39 +02:00 |
|
github-actions[bot]
|
d5460e2d1f
|
chore: bump version to 2.1.0 [skip ci]
|
2024-10-18 13:21:15 +00:00 |
|
Michele Dolfi
|
b346faf622
|
feat: add coverage_threshold to skip OCR for small images (#161)
* feat: add coverage_threshold to skip OCR for small images
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* filter individual boxes
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* rename option
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-18 13:58:23 +02:00 |
|
ABHISHEK FADAKE
|
f799e777c1
|
docs: typo fix (#155)
* Docs: Typo fix
- Corrected spelling of invidual to automatic
Signed-off-by: ABHISHEK FADAKE <31249309+fadkeabhi@users.noreply.github.com>
* add synchronize event for forks
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: ABHISHEK FADAKE <31249309+fadkeabhi@users.noreply.github.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-18 13:56:48 +02:00 |
|
Panos Vagenas
|
63bef59d9e
|
fix: fix legacy doc ref (#162)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-18 13:11:20 +02:00 |
|
Michele Dolfi
|
bb7a58d45d
|
ci: run ci also on forks (#160)
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
|
2024-10-18 12:32:27 +02:00 |
|
Christoph Auer
|
a00c937e19
|
Ensure all models work only on valid pages (#158)
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-18 08:54:06 +02:00 |
|
Peter Staar
|
c1d9241b39
|
updated the asciidoc backend
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-18 08:28:02 +02:00 |
|
Peter Staar
|
12033537e3
|
updated the base-model and added the asciidoc_backend
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-17 19:58:07 +02:00 |
|
Maxim Lysak
|
034a411057
|
docs: add graphical band in readme (#154)
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-17 18:15:40 +02:00 |
|
Michele Dolfi
|
61c092f445
|
docs: add use docling (#150)
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-17 18:14:48 +02:00 |
|
Michele Dolfi
|
24f949ada2
|
chore: run apt-get update before install (#156)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-17 17:27:16 +02:00 |
|
github-actions[bot]
|
a29c256041
|
chore: bump version to 2.0.0 [skip ci]
|
2024-10-16 19:48:06 +00:00 |
|
Christoph Auer
|
7d3be0edeb
|
feat!: Docling v2 (#117)
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-16 21:02:03 +02:00 |
|
Panos Vagenas
|
d504432c1e
|
docs: introduce docs site (#141)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-14 14:13:13 +02:00 |
|
Michele Dolfi
|
2b1e72d327
|
refactor: fix type of tesseractocr options (#140)
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
|
2024-10-14 08:40:22 +02:00 |
|
github-actions[bot]
|
4672b24c1a
|
chore: bump version to 1.20.0 [skip ci]
|
2024-10-11 13:48:02 +00:00 |
|
Christoph Auer
|
5e4944f15f
|
feat: new experimental docling-parse v2 backend (#131)
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-11 15:12:49 +02:00 |
|