Peter Staar
0f172cce2f
added test to verify the cells in the pages (3)
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-28 10:58:41 +02:00
Peter Staar
c6440c8911
added test to verify the cells in the pages (2)
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-28 10:58:19 +02:00
Peter Staar
e6ed6f4793
added test to verify the cells in the pages
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-28 10:39:17 +02:00
Peter Staar
f853d0afa1
reformat code
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-28 09:10:41 +02:00
Peter Staar
0d4fd90036
added verification of input cells
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-28 09:09:06 +02:00
Peter Staar
3dbd6781df
commented out json verification for now
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-27 17:01:29 +02:00
Christoph Auer
93bdaf063b
Fix backend tests
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-27 16:34:21 +02:00
Christoph Auer
f517e63b02
Fix backend tests
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-27 16:25:29 +02:00
Michele Dolfi
40d754f03d
ci: avoid duplicate runs
...
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2024-08-27 16:25:29 +02:00
Peter Staar
b548687a06
commented out the drawing
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-27 16:13:37 +02:00
Christoph Auer
774704ae8c
Merge branch 'main' of github.com:DS4SD/docling into dev/add-strict-tests
2024-08-27 15:18:51 +02:00
Christoph Auer
e59ea8e04e
Fix backend tests
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-27 15:18:35 +02:00
Peter Staar
4980b71185
reformatted code
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-27 13:32:45 +02:00
github-actions[bot]
d0403aaebf
chore: bump version to 1.8.2 [skip ci]
2024-08-27 09:53:15 +00:00
Panos Vagenas
e46a66a176
fix: refine conversion result ( #52 )
...
- fields `output` & `assembled` need not be optional
- introduced "synonym" `ConversionResult` for `ConvertedDocument` & deprecated the latter
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-08-27 11:50:43 +02:00
Peter Staar
35bd7b9cff
replaced deprecated json function with model_dump_json
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 20:38:42 +02:00
Peter Staar
08364dfa56
replaced deprecated json function with model_dump_json
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 20:32:23 +02:00
Peter Staar
24c0b9d4c9
ran pre-commit
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 20:22:31 +02:00
Peter Staar
c64489a82c
added first test for json and md output
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 20:21:18 +02:00
Peter Staar
64640337a3
added the reference converted documents
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 18:01:54 +02:00
Peter Staar
b7debe7250
need to start running all tests successfully
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 17:50:39 +02:00
Peter Staar
2c66075390
updated the toplevel function test
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 17:49:38 +02:00
Peter Staar
12eea8495f
renamed the test folder and added the toplevel test
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 17:00:30 +02:00
Peter Staar
f5eb49a811
add the pytests
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-08-26 16:26:17 +02:00
Michele Dolfi
fe817b11d7
docs: update interface in README ( #50 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-26 15:36:39 +02:00
github-actions[bot]
7052bee999
chore: bump version to 1.8.1 [skip ci]
2024-08-26 11:55:37 +00:00
Michele Dolfi
8cc147bc56
fix: align output formats ( #49 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-26 13:30:26 +02:00
github-actions[bot]
053eae4bdf
chore: bump version to 1.8.0 [skip ci]
2024-08-23 14:24:04 +00:00
Christoph Auer
a294b7e64a
feat: Page-level error reporting from PDF backend, introduce PARTIAL_SUCCESS status ( #47 )
...
* Put safety-checks for failed parse of pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Introduce page-level error checks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bump to docling-parse 1.1.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Introduce page-level error checks
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-23 16:18:41 +02:00
github-actions[bot]
3226b20779
chore: bump version to 1.7.1 [skip ci]
2024-08-23 11:56:02 +00:00
Christoph Auer
8808463cec
fix: Better raise exception when a page fails to parse ( #46 )
...
* Put safety-checks for failed parse of pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bump to docling-parse 1.1.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Raise from page backend if page is not correctly parsed
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-23 13:51:42 +02:00
Christoph Auer
7e84533299
fix: Upgrade docling-parse to 1.1.1, safety checks for failed parse on pages ( #45 )
...
* Put safety-checks for failed parse of pages
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bump to docling-parse 1.1.1
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-23 12:51:02 +02:00
github-actions[bot]
1930f08d4e
chore: bump version to 1.7.0 [skip ci]
2024-08-22 12:00:25 +00:00
Christoph Auer
a8c6b29a67
feat: Upgrade docling-parse PDF backend and interface to use page-by-page parsing ( #44 )
...
* Use docling-parse page-by-page
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Propagate document_hash to PDF backends, use docling-parse 1.0.0
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Upgrade lockfile
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* repin after more packages on pypi
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-22 13:49:37 +02:00
github-actions[bot]
f7c50c8b0e
chore: bump version to 1.6.3 [skip ci]
2024-08-22 11:02:35 +00:00
Michele Dolfi
fac5745dc8
fix: usage of bytesio with docling-parse ( #43 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-22 12:59:49 +02:00
github-actions[bot]
1347c01a9e
chore: bump version to 1.6.2 [skip ci]
2024-08-22 07:32:54 +00:00
Michele Dolfi
69952682ed
fix: remove [ocr] extra to fix wheel install ( #42 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-22 09:25:19 +02:00
github-actions[bot]
47c6dab6d2
chore: bump version to 1.6.1 [skip ci]
2024-08-21 17:41:26 +00:00
Christoph Auer
f19871a5a1
fix: Add scipy as dependency ( #40 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-21 17:21:02 +02:00
Christoph Auer
4a1ceaf65c
Update docling-ibm-models to v1.1.2 ( #39 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-21 17:12:38 +02:00
github-actions[bot]
22a5c29c63
chore: bump version to 1.6.0 [skip ci]
2024-08-20 13:34:53 +00:00
Christoph Auer
e94d317c02
feat: Add adaptive OCR, factor out treatment of OCR areas and cell filtering ( #38 )
...
* Introduce adaptive OCR
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Factor out BaseOcrModel, add docling-parse backend tests, fixes
* Make easyocr default dep
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-20 15:28:03 +02:00
github-actions[bot]
47b8ad917e
chore: bump version to 1.5.0 [skip ci]
2024-08-20 11:53:52 +00:00
Michele Dolfi
78347bf679
feat: allow computing page images on-demand with scale and cache them ( #36 )
...
* feat: allow computing page images on-demand and cache them
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* feat: expose scale for export of page images and document elements
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* fix comment
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-20 13:27:19 +02:00
Christoph Auer
c253dd743a
Add redbooks to test data, small additions ( #35 )
...
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
2024-08-20 12:36:00 +02:00
Michele Dolfi
a13114bafd
docs: add technical paper ref ( #37 )
...
* docs: add technical paper ref
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
* use techreport bibtex type
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
---------
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
2024-08-20 12:32:53 +02:00
github-actions[bot]
778e51ef18
chore: bump version to 1.4.0 [skip ci]
2024-08-14 11:46:55 +00:00
Michele Dolfi
349b0e914f
fix: allow newer torch versions ( #34 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-14 13:37:36 +02:00
Michele Dolfi
90dd676422
feat: update parser with bytesio interface and set as new default backend ( #32 )
...
* update parser with bytesio interface
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* change default backend
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
* update DEFAULT_BACKEND
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
---------
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-14 12:30:00 +02:00