* Example on how to apply to Docling Document OCR as a post-processing with "nanonets-ocr2-3b" via LM Studio
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added support of elements with multiple provenances
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* cleaning up
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* improved prompt for nanonets-ocr2-3b
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* cleaning up
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* excluded example from CI
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* updated class name
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Improved usability of the example, added simple cli, and some helper functions
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Fix api_image_request usage
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix pydantic errors
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Improvements and corrections
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added string sanitation, removing break lines from remote OCR, also preserving original text from json
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Added quick and reliable detection of empty image crops (elements, table cells, form items), these are not sent to OCR
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* Example respects ocr_documents.txt, tuned empty crop detection
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
* cleaning api_image_request
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
---------
Signed-off-by: Maksym Lysak <mly@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>