..
data
refactor: upgrade BeautifulSoup4 with type hints ( #999 )
2025-02-18 11:30:47 +01:00
data_scanned
fix: Revise DocTags, fix iterate_items to output content_layer in items ( #965 )
2025-02-17 14:11:55 +01:00
__init__.py
fix: Add unit tests ( #51 )
2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_backend_csv.py
test: validate actual docitems in tests ( #966 )
2025-02-14 17:47:53 +01:00
test_backend_docling_json.py
feat: add Docling JSON ingestion ( #783 )
2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_backend_docling_parse.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_backend_html.py
fix: parse html with omitted body tag ( #818 )
2025-01-27 16:59:00 +01:00
test_backend_jats.py
feat(xml-jats): parse XML JATS documents ( #967 )
2025-02-17 10:43:31 +01:00
test_backend_markdown.py
fix(markdown): handle nested lists ( #910 )
2025-02-07 12:55:12 +01:00
test_backend_msexcel.py
test: validate actual docitems in tests ( #966 )
2025-02-14 17:47:53 +01:00
test_backend_msword.py
test: validate actual docitems in tests ( #966 )
2025-02-14 17:47:53 +01:00
test_backend_patent_uspto.py
feat: Add content_layer property to items to address body, furniture and other roles ( #735 )
2025-02-10 12:07:49 +01:00
test_backend_pdfium.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_backend_pptx.py
test: validate actual docitems in tests ( #966 )
2025-02-14 17:47:53 +01:00
test_cli.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_code_formula.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_data_gen_flag.py
fix(markdown): handle nested lists ( #910 )
2025-02-07 12:55:12 +01:00
test_document_picture_classifier.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_e2e_conversion.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_e2e_ocr_conversion.py
feat: Python 3.13 support ( #841 )
2025-01-30 17:26:42 +01:00
test_input_doc.py
feat(xml-jats): parse XML JATS documents ( #967 )
2025-02-17 10:43:31 +01:00
test_interfaces.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_invalid_input.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_legacy_format_transform.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
test_options.py
fix: Test cases for RTL programmatic PDFs and fixes for the formula model ( #903 )
2025-02-07 08:43:31 +01:00
verify_utils.py
test: validate actual docitems in tests ( #966 )
2025-02-14 17:47:53 +01:00