..
data
A new HTML backend that handles styled html (ignors it) as well as images.
2025-05-24 22:29:22 +02:00
data_scanned
fix(pypdfium): resolve overlapping text when merging bounding boxes ( #1549 )
2025-05-19 15:26:00 +02:00
__init__.py
fix: Add unit tests ( #51 )
2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_csv.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_docling_json.py
feat: add Docling JSON ingestion ( #783 )
2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_docling_parse_v4.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_docling_parse.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_html.py
A new HTML backend that handles styled html (ignors it) as well as images.
2025-05-24 22:29:22 +02:00
test_backend_jats.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_markdown.py
fix(markdown): handle nested lists ( #910 )
2025-02-07 12:55:12 +01:00
test_backend_msexcel.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_msword.py
feat: add textbox content extraction in msword_backend ( #1538 )
2025-05-19 15:01:36 +02:00
test_backend_patent_uspto.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_pdfium.py
fix(pypdfium): resolve overlapping text when merging bounding boxes ( #1549 )
2025-05-19 15:26:00 +02:00
test_backend_pptx.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_backend_webp.py
feat: support image/webp file type ( #1415 )
2025-05-14 09:47:28 +02:00
test_cli.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_code_formula.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_data_gen_flag.py
fix(markdown): handle nested lists ( #910 )
2025-02-07 12:55:12 +01:00
test_document_picture_classifier.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_e2e_conversion.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_e2e_ocr_conversion.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_input_doc.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_interfaces.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_invalid_input.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_legacy_format_transform.py
ci: add coverage and ruff ( #1383 )
2025-04-14 18:01:26 +02:00
test_options.py
feat: Add DoclingParseV4 backend, using high-level docling-parse API ( #905 )
2025-03-18 10:38:19 +01:00
test_settings_load.py
fix(settings): fix nested settings load via environment variables ( #1551 )
2025-05-14 13:42:10 +02:00
verify_utils.py
feat: support image/webp file type ( #1415 )
2025-05-14 09:47:28 +02:00