docling/tests/data/groundtruth/docling_v2
Cesar Berrospi Ramis b886e4df31
fix(asciidoc): set default size when missing in image directive (#1769)
The AsciiDoc backend should not create an ImageRef with Size equal to None, instead use default size values.
Refactor static methods as such and add the staticmethod decorator.
Extend the regression test for this fix.

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-06-16 10:38:46 +02:00
..
2203.01017v2.doctags.txt fix: prov for merged-elems (#1728) 2025-06-10 11:22:42 +02:00
2203.01017v2.json fix: prov for merged-elems (#1728) 2025-06-10 11:22:42 +02:00
2203.01017v2.md chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
2203.01017v2.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2206.01062.doctags.txt fix: prov for merged-elems (#1728) 2025-06-10 11:22:42 +02:00
2206.01062.json fix: prov for merged-elems (#1728) 2025-06-10 11:22:42 +02:00
2206.01062.md chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
2206.01062.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2305.03393v1-pg9.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.json feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
2305.03393v1-pg9.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
2305.03393v1.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
2305.03393v1.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
amt_handbook_sample.doctags.txt fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
amt_handbook_sample.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
amt_handbook_sample.md docs: Add example for inspection of picture content (#624) 2025-01-29 10:39:00 +01:00
amt_handbook_sample.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
blocks.md.md fix: Pass tests, update docling-core to 2.22.0 (#1150) 2025-03-13 09:45:55 +01:00
bmj_sample.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
bmj_sample.xml.json feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
bmj_sample.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
code_and_formula.doctags.txt chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
code_and_formula.json chore: format JSON test files to enable comparison (#1511) 2025-05-02 10:52:18 +02:00
code_and_formula.md chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
code_and_formula.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
csv-comma-in-cell.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-comma-in-cell.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-comma-in-cell.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-comma.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-comma.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-comma.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-inconsistent-header.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-inconsistent-header.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-inconsistent-header.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-pipe.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-pipe.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-pipe.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-semicolon.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-semicolon.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-semicolon.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-tab.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-tab.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-tab.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-few-columns.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-few-columns.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-too-few-columns.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-many-columns.csv.itxt feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
csv-too-many-columns.csv.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
csv-too-many-columns.csv.md feat: Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) 2025-02-14 08:55:09 +01:00
duck.md.md fix: fix single newline handling in MD backend (#824) 2025-01-28 19:05:55 +01:00
elife-56337.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
elife-56337.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
ending_with_table.md.md fix(markdown): fix parsing if doc ending with table (#873) 2025-02-03 14:38:38 +01:00
equations.docx.itxt fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) 2025-04-08 17:11:37 +02:00
equations.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
equations.docx.md fix(docx): Adding new latex symbols, simplifying how equations are added to text (#1295) 2025-04-08 17:11:37 +02:00
example_8.html.itxt feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
example_8.html.json feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
example_8.html.md feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
example_01.html.itxt refactor: add the contentlayer to html-backend (#1040) 2025-03-02 10:37:53 -05:00
example_01.html.json fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_01.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_02.html.itxt refactor: add the contentlayer to html-backend (#1040) 2025-03-02 10:37:53 -05:00
example_02.html.json fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_02.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_03.html.itxt refactor: add the contentlayer to html-backend (#1040) 2025-03-02 10:37:53 -05:00
example_03.html.json fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_03.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
example_04.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_04.html.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_05.html.itxt fix: parse html with omitted body tag (#818) 2025-01-27 16:59:00 +01:00
example_05.html.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_05.html.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_06.html.itxt fix(html): handle address, details, and summary tags (#1436) 2025-04-23 09:30:59 +02:00
example_06.html.json fix(html): handle address, details, and summary tags (#1436) 2025-04-23 09:30:59 +02:00
example_06.html.md fix(html): handle address, details, and summary tags (#1436) 2025-04-23 09:30:59 +02:00
example_07.html.itxt fix(html): handle nested empty lists (#1154) 2025-03-13 16:56:58 +01:00
example_07.html.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
example_07.html.md fix(html): handle nested empty lists (#1154) 2025-03-13 16:56:58 +01:00
example_08.html.itxt test: add missing ground truth files (#1667) 2025-05-28 13:26:49 +02:00
example_08.html.json test: add missing ground truth files (#1667) 2025-05-28 13:26:49 +02:00
example_08.html.md test: add missing ground truth files (#1667) 2025-05-28 13:26:49 +02:00
ipa20180000016.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20180000016.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
ipa20180000016.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
ipa20200022300.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20200022300.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
ipa20200022300.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
lorem_ipsum.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
lorem_ipsum.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
mixed_without_h1.md.md fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
mixed.md.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
multi_page.doctags.txt fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) 2025-05-19 15:26:00 +02:00
multi_page.json fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) 2025-05-19 15:26:00 +02:00
multi_page.md fix(pypdfium): resolve overlapping text when merging bounding boxes (#1549) 2025-05-19 15:26:00 +02:00
multi_page.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
nested.md.md fix(markdown): handle nested lists (#910) 2025-02-07 12:55:12 +01:00
pa20010031492.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pa20010031492.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pa20010031492.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pftaps057006474.itxt fix: Pass tests, update docling-core to 2.22.0 (#1150) 2025-03-13 09:45:55 +01:00
pftaps057006474.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pftaps057006474.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pg06442728.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pg06442728.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
pg06442728.md feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
picture_classification.doctags.txt fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
picture_classification.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
picture_classification.md feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
pnas_sample.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pnas_sample.xml.json feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pnas_sample.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pntd.0008301.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.xml.itxt feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
pone.0234687.xml.md feat(xml-jats): parse XML JATS documents (#967) 2025-02-17 10:43:31 +01:00
powerpoint_sample.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_sample.pptx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
powerpoint_sample.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
powerpoint_with_image.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
redp5110_sampled.doctags.txt chore: propagate docling-core fix (#1389) 2025-04-15 10:51:47 +02:00
redp5110_sampled.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
redp5110_sampled.md chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
redp5110_sampled.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_01.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.json chore: format JSON test files to enable comparison (#1511) 2025-05-02 10:52:18 +02:00
right_to_left_01.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_02.doctags.txt chore: update locked deps (#1239) 2025-03-25 15:48:02 +01:00
right_to_left_02.json chore: format JSON test files to enable comparison (#1511) 2025-05-02 10:52:18 +02:00
right_to_left_02.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
right_to_left_03.doctags.txt fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
right_to_left_03.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
right_to_left_03.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.pages.json feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it (#1745) 2025-06-13 19:01:55 +02:00
sample_sales_data.xlsm.itxt feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
sample_sales_data.xlsm.json feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
sample_sales_data.xlsm.md feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
tablecell.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
tablecell.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
tablecell.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_01.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_03.asciidoc.md fix(asciidoc): set default size when missing in image directive (#1769) 2025-06-16 10:38:46 +02:00
test_emf_docx.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_emf_docx.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
test_emf_docx.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test-01.xlsx.itxt fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
test-01.xlsx.json feat(xlsx): create a page for each worksheet in XLSX backend (#1332) 2025-04-11 10:29:53 +02:00
test-01.xlsx.md fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
textbox.docx.itxt feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
textbox.docx.json feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
textbox.docx.md feat: support xlsm files (#1520) 2025-06-10 16:55:59 +02:00
unit_test_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.json fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
unit_test_01.html.md fix(html): fix HTML parsed heading level (#1244) 2025-03-26 10:30:23 +01:00
unit_test_formatting.docx.itxt feat(docx): add text formatting and hyperlink support (#630) 2025-04-03 15:11:50 +02:00
unit_test_formatting.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
unit_test_formatting.docx.md feat(docx): add text formatting and hyperlink support (#630) 2025-04-03 15:11:50 +02:00
unit_test_headers_numbered.docx.itxt fix(docx): identifying numbered headers (#1231) 2025-03-25 11:41:02 +01:00
unit_test_headers_numbered.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
unit_test_headers_numbered.docx.md fix(docx): identifying numbered headers (#1231) 2025-03-25 11:41:02 +01:00
unit_test_headers.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
unit_test_headers.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
unit_test_lists.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.itxt fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
wiki_duck.html.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
wiki_duck.html.md fix: improve HTML layer detection, various MD fixes (#1241) 2025-03-26 16:07:14 +01:00
wiki.md.md fix: fix single newline handling in MD backend (#824) 2025-01-28 19:05:55 +01:00
word_sample.docx.itxt fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
word_sample.docx.md fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.yaml fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_tables.docx.html feat(cli): add option for html with split-page mode (#1355) 2025-04-14 08:41:50 +02:00
word_tables.docx.itxt fix(docx): merged table cells not properly converted (#857) 2025-02-03 10:20:03 +01:00
word_tables.docx.json feat(ocr): auto-detect rotated pages in Tesseract (#1167) 2025-05-21 18:12:33 +02:00
word_tables.docx.md fix(docx): merged table cells not properly converted (#857) 2025-02-03 10:20:03 +01:00