Peter Staar
14ab351fdb
chore: add simple convert script
...
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
2024-09-12 08:38:08 +02:00
Michele Dolfi
9550db8e64
docs: improve examples ( #27 )
...
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2024-08-07 17:16:35 +02:00
Maxim Lysak
b8f5e38a8c
feat: introducing docling_backend ( #26 )
...
Uses our own docling_parse to reliably get PDF cells
To get page images, this backend uses pypdfium2
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
2024-08-07 16:22:36 +02:00
Maxim Lysak
f4bf3d25b9
fix: Correct text extraction for table cells ( #21 )
...
* - Fixes for scaling transformation for table cell bounding boxes when using do_cell_matching = False
- Corrected examples/convert.py with appropriate parameter, for good quality example conversion
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
* Completed checks
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
---------
Signed-off-by: Maxim Lysak <mly@zurich.ibm.com>
Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>
2024-07-30 14:51:47 +02:00
Christoph Auer
b9dc892385
Update convert.py ( #3 )
...
Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
2024-07-15 18:02:42 +02:00
Christoph Auer
e2d996753b
Initial commit
2024-07-15 09:42:42 +02:00