docling

mirror of https://github.com/DS4SD/docling.git synced 2025-07-27 04:24:45 +00:00

Author	SHA1	Message	Date
Peter Staar	14ab351fdb	chore: add simple convert script Signed-off-by: Peter Staar <taa@zurich.ibm.com>	2024-09-12 08:38:08 +02:00
Michele Dolfi	9550db8e64	docs: improve examples (#27 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2024-08-07 17:16:35 +02:00
Maxim Lysak	b8f5e38a8c	feat: introducing docling_backend (#26 ) Uses our own docling_parse to reliably get PDF cells To get page images, this backend uses pypdfium2 Signed-off-by: Maxim Lysak <mly@zurich.ibm.com> Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>	2024-08-07 16:22:36 +02:00
Maxim Lysak	f4bf3d25b9	fix: Correct text extraction for table cells (#21 ) * - Fixes for scaling transformation for table cell bounding boxes when using do_cell_matching = False - Corrected examples/convert.py with appropriate parameter, for good quality example conversion Signed-off-by: Maxim Lysak <mly@zurich.ibm.com> * Completed checks Signed-off-by: Maxim Lysak <mly@zurich.ibm.com> --------- Signed-off-by: Maxim Lysak <mly@zurich.ibm.com> Co-authored-by: Maxim Lysak <mly@zurich.ibm.com>	2024-07-30 14:51:47 +02:00
Christoph Auer	b9dc892385	Update convert.py (#3 ) Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>	2024-07-15 18:02:42 +02:00
Christoph Auer	e2d996753b	Initial commit	2024-07-15 09:42:42 +02:00