Commit Graph

111 Commits

Author SHA1 Message Date
Michele Dolfi
90dd676422
feat: update parser with bytesio interface and set as new default backend (#32)
* update parser with bytesio interface

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* change default backend

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update DEFAULT_BACKEND

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-14 12:30:00 +02:00
Michele Dolfi
79ef8d2f2f
fix: update (vuln) deps (#29)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-07 17:29:36 +02:00
Maxim Lysak
b8f5e38a8c
feat: introducing docling_backend (#26)
Uses our own docling_parse to reliably get PDF cells
To get page images, this backend uses pypdfium2

Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
2024-08-07 16:22:36 +02:00
Panos Vagenas
d2d9543415
fix: set page number using 1-based indexing (#22)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-07-31 14:28:44 +02:00
Panos Vagenas
d603137383
feat: add simplified single-doc conversion (#20)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-07-26 16:55:33 +02:00
Michele Dolfi
54b3dda141
fix: add easyocr to main deps for valid extra (#19)
* fix: add easyocr to main deps for valid extra

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove group

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-07-24 14:11:26 +02:00
Michele Dolfi
b0725e0aa6
fix: expose ocr as extra (#18)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-07-24 11:14:17 +02:00
Michele Dolfi
7bc20adc16
pin docling-ibm-models 1.1.0 with python 3.10 support (#15)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-07-18 17:27:48 +02:00
Panos Vagenas
eb0b208272
chore: switch to docling-core Markdown export (#14)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-07-18 16:10:05 +02:00
Michele Dolfi
fb72688ff7
feat: enable python 3.12 support by updating glm (#8)
* update deepsearch-glm for python 3.12 support

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* enable python 3.12 in ci tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-07-17 14:03:26 +02:00
Christoph Auer
e2d996753b Initial commit 2024-07-15 09:42:42 +02:00