Christoph Auer
|
52713f0cf5
|
Optionally produce legacy_doc
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-11 12:57:47 +02:00 |
|
Christoph Auer
|
025983f07b
|
Backend error handling fixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-11 11:18:47 +02:00 |
|
Christoph Auer
|
304d16029a
|
More renaming, design enrichment interface
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-11 10:21:31 +02:00 |
|
Michele Dolfi
|
051beae203
|
use new interface in minimal example
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-11 08:30:09 +02:00 |
|
Christoph Auer
|
7aad3dc946
|
Update test cases for v2
|
2024-10-10 18:51:19 +02:00 |
|
Christoph Auer
|
cd72ea2412
|
Added verify_conversion_result_v2, Regenerate v1 and v2 test data
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-10 18:30:54 +02:00 |
|
Michele Dolfi
|
1bcad334f2
|
pin docling-parse release
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-10 18:30:09 +02:00 |
|
Michele Dolfi
|
3794f8245e
|
add example PNG
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-10 18:29:26 +02:00 |
|
Michele Dolfi
|
a84ba6ddec
|
list all PIL supported mime types
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-10 18:28:56 +02:00 |
|
Michele Dolfi
|
bde8186700
|
update pinning
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-10 17:54:05 +02:00 |
|
Michele Dolfi
|
c31045754d
|
Merge branch 'cau/input-format-abstraction' of github.com:DS4SD/docling into cau/input-format-abstraction
|
2024-10-10 17:41:07 +02:00 |
|
Michele Dolfi
|
50c05b262a
|
pin updates compatible with each other
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-10 17:40:32 +02:00 |
|
Christoph Auer
|
99cfea38d6
|
Added verify_conversion_result_v2, Regenerate v1 and v2 test data
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-10 15:37:59 +02:00 |
|
Christoph Auer
|
7cad290ceb
|
Refactor test data, legacy usage and more
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-10 13:54:44 +02:00 |
|
Maxim Lysak
|
da0700f959
|
Fixes for docx backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-09 16:52:44 +02:00 |
|
Christoph Auer
|
b5a27386c1
|
Merge from main, update OCR model and test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-09 16:04:19 +02:00 |
|
Christoph Auer
|
0dfbd0b6fc
|
Update examples and test cases
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-09 15:20:27 +02:00 |
|
Panos Vagenas
|
6924999f1f
|
chore: explicitly manage pandas dependency (#134)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-10-09 14:50:39 +02:00 |
|
github-actions[bot]
|
0ffc1708d2
|
chore: bump version to 1.19.0 [skip ci]
v1.19.0
|
2024-10-08 17:42:29 +00:00 |
|
Michele Dolfi
|
f96ea86a00
|
feat: add options for choosing OCR engines (#118)
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
|
2024-10-08 19:07:08 +02:00 |
|
Christoph Auer
|
080042d06d
|
Merge from upstream
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-08 16:40:55 +02:00 |
|
Christoph Auer
|
203cf19b1b
|
Lots of improvements
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-08 16:38:42 +02:00 |
|
Maxim Lysak
|
07d952acf9
|
Improved backends
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-08 16:37:47 +02:00 |
|
Christoph Auer
|
c0447206af
|
Merge from main
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-08 14:42:33 +02:00 |
|
Christoph Auer
|
1d55cbdca9
|
Updates for Powerpoint backend
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-08 13:19:58 +02:00 |
|
Maxim Lysak
|
89e58ca730
|
Added HTML backend implementation, few improvements for other backends
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-08 11:14:44 +02:00 |
|
Fasal Shah
|
d412c363d7
|
fixed unload pdf backend resources (#129)
Signed-off-by: faisal shah <fashah@redhat.com>
Co-authored-by: faisal shah <fashah@redhat.com>
|
2024-10-08 10:46:43 +02:00 |
|
Maxim Lysak
|
f773d8a621
|
Improved demo code, that saves output mds to files
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-07 17:25:17 +02:00 |
|
Maxim Lysak
|
bea9fc22af
|
Added mspowerpoint backend first implementation, improvements on msword backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-07 14:55:21 +02:00 |
|
Maxim Lysak
|
1346843301
|
Improved docx parsing
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-07 13:00:50 +02:00 |
|
Christoph Auer
|
e613f7bc6c
|
Add comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-07 12:35:25 +02:00 |
|
Maxim Lysak
|
cefc34e8d8
|
Working on a first version of DOCX native backend
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-04 18:19:40 +02:00 |
|
github-actions[bot]
|
9b82ae3324
|
chore: bump version to 1.18.0 [skip ci]
v1.18.0
|
2024-10-03 17:16:00 +00:00 |
|
Maxim Lysak
|
2422f706a1
|
feat: new torch-based docling models (#120)
---------
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
|
2024-10-03 18:42:33 +02:00 |
|
github-actions[bot]
|
9ebbbc1245
|
chore: bump version to 1.17.0 [skip ci]
v1.17.0
|
2024-10-03 13:44:52 +00:00 |
|
Rui Dias Gomes
|
dde0aff8bd
|
update examples (#123)
Signed-off-by: rmdg88 <rmdg88@gmail.com>
|
2024-10-03 14:28:25 +02:00 |
|
Michele Dolfi
|
d44c62d7ce
|
feat: windows support (#122)
* feat: windows support
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* add Windows in README
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-10-03 14:23:47 +02:00 |
|
Christoph Auer
|
1fa7cd9855
|
Fundamental refactoring for multi-format support
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-10-01 16:54:09 +02:00 |
|
Christoph Auer
|
cd06d89c2a
|
Merge branch 'cau/experimental-format' of github.com:DS4SD/docling into cau/input-format-abstraction
|
2024-09-30 13:47:57 +02:00 |
|
Christoph Auer
|
0a86529afb
|
Repinning
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-09-30 13:47:22 +02:00 |
|
github-actions[bot]
|
cde671cf34
|
chore: bump version to 1.16.1 [skip ci]
v1.16.1
|
2024-09-27 14:36:40 +00:00 |
|
Michele Dolfi
|
34bd887a7f
|
fix: allow usage of opencv 4.6.x (#110)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
2024-09-27 15:51:43 +02:00 |
|
Christoph Auer
|
91ab382129
|
Renaming changes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-09-27 15:20:01 +02:00 |
|
Panos Vagenas
|
c05b692d69
|
docs: document chunking (#111)
[skip ci]
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
|
2024-09-27 11:16:04 +02:00 |
|
Christoph Auer
|
2461b56b84
|
Import rewrites, adapt to changes in docling-core
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-09-27 09:21:15 +02:00 |
|
github-actions[bot]
|
6760571fe1
|
chore: bump version to 1.16.0 [skip ci]
v1.16.0
|
2024-09-27 06:21:15 +00:00 |
|
Christoph Auer
|
d6df76f90b
|
feat: Support tableformer model choice (#90)
* Support tableformer model choice
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update datamodel structure
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update docs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add test unit for table options
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Ensure import backwards-compatibility for PipelineOptions
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update README
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Adjust parameters on custom_convert
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
* Update Dockerfile
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
|
2024-09-26 21:37:08 +02:00 |
|
Christoph Auer
|
9ffd1dc396
|
Merge from main
|
2024-09-26 18:06:08 +02:00 |
|
Christoph Auer
|
0ee82a5e78
|
Bump deepsearch-glm
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-09-25 16:05:54 +02:00 |
|
Christoph Auer
|
ba9d115f64
|
Examples: Don't export experimental output by default
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
2024-09-25 15:56:29 +02:00 |
|