* Add DocumentConverter.extract and full extraction pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add DocumentConverter.extract template arg
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add NuExtract model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add Extraction pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add proper test, support pydantic class types
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add qr bill example
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add base_extraction_pipeline
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add types
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update typing of ExtractionResult and inner fields
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Factor out extract to DocumentExtractor
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Address mypy issues
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add DocumentExtractor
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Resolve circular import issue
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Clean up imports, remove Optional for template arg
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move new type definitions into datamodel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update comments
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Respect page-range, disable test_extraction for CI
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>