docling/tests
Panos Vagenas 88a0e66adc
feat: add Docling JSON ingestion (#783)
* feat: add Docling JSON ingestion

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* update conversion as per review comments, add tests, revert Docling JSON disambiguation, document intricacies

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

* Update docling/backend/json/docling_json_backend.py

Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-01-24 18:05:23 +01:00
..
data feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
data_scanned feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
__init__.py fix: Add unit tests (#51) 2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
test_backend_docling_json.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_backend_docling_parse.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_backend_html.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
test_backend_msexcel.py feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
test_backend_msword.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
test_backend_patent_uspto.py feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
test_backend_pdfium.py chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
test_backend_pptx.py feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
test_backend_pubmed.py feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
test_cli.py test: generate file from CLI in a temporary directory (#618) 2024-12-17 16:35:42 +01:00
test_code_formula.py feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
test_e2e_conversion.py feat: Add pipeline timings and toggle visualization, establish debug settings (#183) 2024-10-30 15:04:19 +01:00
test_e2e_ocr_conversion.py feat: add "auto" language for TesseractOcr (#759) 2025-01-23 12:40:50 +01:00
test_input_doc.py feat: add Docling JSON ingestion (#783) 2025-01-24 18:05:23 +01:00
test_interfaces.py fix: improve handling of disallowed formats (#429) 2024-12-03 12:45:32 +01:00
test_invalid_input.py fix: improve handling of disallowed formats (#429) 2024-12-03 12:45:32 +01:00
test_legacy_format_transform.py fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
test_options.py feat: Introduce support for GPU Accelerators (#593) 2024-12-13 17:45:22 +01:00
verify_utils.py feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) 2024-11-12 09:46:14 +01:00