docling

mirror of https://github.com/DS4SD/docling.git synced 2025-07-25 19:44:34 +00:00

History

Cesar Berrospi Ramis de7b963b09 fix(html): use 'start' attribute when parsing ordered lists from HTML docs (#1062 ) * fix(html): use 'start' attribute in ordered lists When parsing ordered lists in HTML, take into account the 'start' attribute if it exists. Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> * chore(html): reduce verbosity in HTML backend Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com> --------- Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>		2025-02-27 09:46:57 +01:00
..
data	fix(html): Parse text in div elements as TextItem (#1041 )	2025-02-24 12:38:29 +01:00
data_scanned	feat: Implement new reading-order model (#916 )	2025-02-20 17:51:17 +01:00
__init__.py	fix: Add unit tests (#51 )	2024-08-30 14:08:20 +02:00
test_backend_asciidoc.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_backend_csv.py	test: avoid testing exact JSON in CSV backend (#1038 )	2025-02-24 08:10:40 +01:00
test_backend_docling_json.py	feat: add Docling JSON ingestion (#783 )	2025-01-24 18:05:23 +01:00
test_backend_docling_parse_v2.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_backend_docling_parse.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_backend_html.py	fix(html): use 'start' attribute when parsing ordered lists from HTML docs (#1062 )	2025-02-27 09:46:57 +01:00
test_backend_jats.py	test: avoid testing exact JSON in CSV backend (#1038 )	2025-02-24 08:10:40 +01:00
test_backend_markdown.py	fix(markdown): handle nested lists (#910 )	2025-02-07 12:55:12 +01:00
test_backend_msexcel.py	test: avoid testing exact JSON in CSV backend (#1038 )	2025-02-24 08:10:40 +01:00
test_backend_msword.py	test: avoid testing exact JSON in CSV backend (#1038 )	2025-02-24 08:10:40 +01:00
test_backend_patent_uspto.py	test: avoid testing exact JSON (#1027 )	2025-02-20 16:20:07 +01:00
test_backend_pdfium.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_backend_pptx.py	test: avoid testing exact JSON in CSV backend (#1038 )	2025-02-24 08:10:40 +01:00
test_cli.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_code_formula.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_data_gen_flag.py	fix(markdown): handle nested lists (#910 )	2025-02-07 12:55:12 +01:00
test_document_picture_classifier.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_e2e_conversion.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_e2e_ocr_conversion.py	feat: Python 3.13 support (#841 )	2025-01-30 17:26:42 +01:00
test_input_doc.py	feat(xml-jats): parse XML JATS documents (#967 )	2025-02-17 10:43:31 +01:00
test_interfaces.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_invalid_input.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_legacy_format_transform.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
test_options.py	fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903 )	2025-02-07 08:43:31 +01:00
verify_utils.py	test: avoid testing exact JSON in CSV backend (#1038 )	2025-02-24 08:10:40 +01:00