docling/tests/data_scanned/groundtruth/docling_v2
Panos Vagenas 7c5614a37a
fix(markdown): fix single-formatted headings & list items (#1820)
* fix(markdown): fix formatting & inline edge cases (show behavior before change)

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* add change and updated test data

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* update lock

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

* improve test case

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>

---------

Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
2025-06-25 13:05:06 +02:00
..
ocr_test_rotated_90.doctags.txt fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718) 2025-06-10 10:57:45 +02:00
ocr_test_rotated_90.json fix(markdown): fix single-formatted headings & list items (#1820) 2025-06-25 13:05:06 +02:00
ocr_test_rotated_90.md feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
ocr_test_rotated_90.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
ocr_test_rotated_180.doctags.txt fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718) 2025-06-10 10:57:45 +02:00
ocr_test_rotated_180.json fix(markdown): fix single-formatted headings & list items (#1820) 2025-06-25 13:05:06 +02:00
ocr_test_rotated_180.md feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
ocr_test_rotated_180.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
ocr_test_rotated_270.doctags.txt fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718) 2025-06-10 10:57:45 +02:00
ocr_test_rotated_270.json fix(markdown): fix single-formatted headings & list items (#1820) 2025-06-25 13:05:06 +02:00
ocr_test_rotated_270.md feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
ocr_test_rotated_270.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
ocr_test.doctags.txt fix(tesseract): initialize df_osd to avoid uninitialized variable error (#1718) 2025-06-10 10:57:45 +02:00
ocr_test.json fix(markdown): fix single-formatted headings & list items (#1820) 2025-06-25 13:05:06 +02:00
ocr_test.md feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
ocr_test.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00