docling/tests/data/groundtruth/docling_v2
2024-12-05 13:18:22 +01:00
..
10-1055-a-2308-2290.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-a-2308-2290.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-a-2308-2290.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-a-2313-0311.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-a-2313-0311.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-a-2313-0311.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0043-1775965.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0043-1775965.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0043-1775965.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0044-1786808.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0044-1786808.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0044-1786808.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0044-1786809.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0044-1786809.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
10-1055-s-0044-1786809.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2593.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2593.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2593.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2595.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2595.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2595.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2621.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2621.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2621.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2651.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2651.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
1349-7235-63-2651.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
2203.01017v2.doctags.txt fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
2203.01017v2.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2203.01017v2.md fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) 2024-10-25 18:02:20 +02:00
2203.01017v2.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2206.01062.doctags.txt fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
2206.01062.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2206.01062.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2206.01062.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1-pg9.doctags.txt fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
2305.03393v1-pg9.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
2305.03393v1-pg9.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2305.03393v1-pg9.pages.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
2305.03393v1.doctags.txt fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
2305.03393v1.json feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
2305.03393v1.md fix: MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) 2024-10-25 18:02:20 +02:00
2305.03393v1.pages.json feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
example_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_01.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_01.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_02.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_03.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
example_04.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
lorem_ipsum.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
PMC4031984-elife-02866.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
PMC4031984-elife-02866.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
PMC4031984-elife-02866.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
powerpoint_sample.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_sample.pptx.json feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_sample.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.itxt feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.json feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
powerpoint_with_image.pptx.md feat: Extracting picture data for raster images found in PPTX (#349) 2024-11-18 15:22:28 +01:00
pubmed-PMC13900.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
pubmed-PMC13900.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
pubmed-PMC13900.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
redp5110_sampled.doctags.txt fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
redp5110_sampled.json chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
redp5110_sampled.md chore: make tests lighter (#228) 2024-11-04 14:02:28 +01:00
redp5110_sampled.pages.json fix: Update tests and examples for docling-core 2.5.1 (#449) 2024-11-27 13:07:00 +01:00
research.0509.nxml.itxt Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
research.0509.nxml.json Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
research.0509.nxml.md Create a XML backend for PubMed documents based on the pubmed_parser library 2024-12-05 13:18:22 +01:00
tablecell.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
tablecell.docx.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
tablecell.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_01.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_02.asciidoc.md feat: Support AsciiDoc and Markdown input format (#168) 2024-10-23 16:14:26 +02:00
test_emf_docx.docx.itxt fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_emf_docx.docx.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test_emf_docx.docx.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
test-01.xlsx.itxt feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
test-01.xlsx.json feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
test-01.xlsx.md feat: added excel backend (#334) 2024-11-19 12:21:17 +01:00
unit_test_01.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_01.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_headers.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
unit_test_lists.docx.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.itxt fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.json fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
wiki_duck.html.md fix: fix duplicate title and heading + add e2e tests for html and docx (#186) 2024-10-30 13:14:56 +01:00
word_sample.docx.itxt fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.docx.json fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.docx.md fix: Fixing images in the input Word files (#330) 2024-11-14 13:33:34 +01:00
word_sample.json fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.md fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00
word_sample.yaml fix: Fixes for wordx (#432) 2024-11-26 14:44:43 +01:00