docling/tests/data/groundtruth/docling_v2
Panos Vagenas 5aed9f8aeb
fix: fix single newline handling in MD backend (#824)
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-28 19:05:55 +01:00
..
2203.01017v2.doctags.txt feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2203.01017v2.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2203.01017v2.md feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2203.01017v2.pages.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2206.01062.doctags.txt feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2206.01062.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2206.01062.md feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2206.01062.pages.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1-pg9.doctags.txt fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
2305.03393v1-pg9.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1-pg9.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2305.03393v1-pg9.pages.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1.doctags.txt feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1.md feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1.pages.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
code_and_formula.doctags.txt feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
code_and_formula.json feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
code_and_formula.md feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
code_and_formula.pages.json feat: Code and equation model for PDF and code blocks in markdown (#752) 2025-01-24 16:54:22 +01:00
duck.md.md fix: fix single newline handling in MD backend (#824) 2025-01-28 19:05:55 +01:00
elife-56337.xml.itxt feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
elife-56337.xml.json feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
elife-56337.xml.md feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
example_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_01.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_01.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_05.html.itxt fix: parse html with omitted body tag (#818) 2025-01-27 16:59:00 +01:00
example_05.html.json fix: parse html with omitted body tag (#818) 2025-01-27 16:59:00 +01:00
example_05.html.md fix: parse html with omitted body tag (#818) 2025-01-27 16:59:00 +01:00
ipa20180000016.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20180000016.json feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20180000016.md feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20200022300.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20200022300.json feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
ipa20200022300.md feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
lorem_ipsum.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
pa20010031492.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pa20010031492.json feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pa20010031492.md feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pftaps057006474.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pftaps057006474.json feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pftaps057006474.md feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pg06442728.itxt feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pg06442728.json feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
pg06442728.md feat: create a backend to parse USPTO patents into DoclingDocument (#606) 2024-12-17 16:35:23 +01:00
picture_classification.doctags.txt feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.json feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.md feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.pages.json feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
pntd.0008301.xml.itxt feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
pntd.0008301.xml.json feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
pntd.0008301.xml.md feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
pone.0234687.xml.itxt feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
pone.0234687.xml.json feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
pone.0234687.xml.md feat: Create a backend to transform PubMed XML files to DoclingDocument (#557) 2024-12-17 19:27:09 +01:00
powerpoint_sample.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_sample.pptx.json feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_sample.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.json feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
redp5110_sampled.doctags.txt feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
redp5110_sampled.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
redp5110_sampled.md feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
redp5110_sampled.pages.json feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
tablecell.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
tablecell.docx.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
tablecell.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_01.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_emf_docx.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_emf_docx.docx.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_emf_docx.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test-01.xlsx.itxt fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
test-01.xlsx.json fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
test-01.xlsx.md fix: added extraction of byte-images in excel (#804) 2025-01-24 18:48:02 +01:00
unit_test_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.itxt feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
wiki_duck.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki.md.md fix: fix single newline handling in MD backend (#824) 2025-01-28 19:05:55 +01:00
word_sample.docx.itxt fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.docx.json fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.docx.md fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.yaml fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00