docling/tests
Cesar Berrospi Ramis 38d622f22c refactor(html): put parsed item in body if doc has no header
In case an HTML does not have any header tag, all parsed items are placed in
DoclingDocument's body content layer.
HTML paragraphs ('p' tags) are parsed as text items with paragraph label.
Update test ground truth accoring to the changes above.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-02-28 18:03:58 +01:00
..
data refactor(html): put parsed item in body if doc has no header 2025-02-28 18:03:58 +01:00
data_scanned feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
__init__.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_csv.py test: avoid testing exact JSON in CSV backend (#1038) 2025-02-24 08:10:40 +01:00
test_backend_docling_json.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_docling_parse.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_html.py test(html): add more info if a test case fails 2025-02-28 16:06:11 +01:00
test_backend_jats.py test: avoid testing exact JSON in CSV backend (#1038) 2025-02-24 08:10:40 +01:00
test_backend_markdown.py fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
test_backend_msexcel.py test: avoid testing exact JSON in CSV backend (#1038) 2025-02-24 08:10:40 +01:00
test_backend_msword.py test: avoid testing exact JSON in CSV backend (#1038) 2025-02-24 08:10:40 +01:00
test_backend_patent_uspto.py test: avoid testing exact JSON (#1027) 2025-02-20 16:20:07 +01:00
test_backend_pdfium.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_backend_pptx.py test: avoid testing exact JSON in CSV backend (#1038) 2025-02-24 08:10:40 +01:00
test_cli.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_code_formula.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_data_gen_flag.py fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
test_document_picture_classifier.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_e2e_conversion.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_e2e_ocr_conversion.py feat: Python 3.13 support (#841) 2025-01-30 17:26:42 +01:00
test_input_doc.py feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
test_interfaces.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_invalid_input.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_legacy_format_transform.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
test_options.py fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
verify_utils.py test: avoid testing exact JSON in CSV backend (#1038) 2025-02-24 08:10:40 +01:00