docling/tests/data/groundtruth/docling_v1
Peter W. J. Staar c0ba88edf1
feat(cli): add option for html with split-page mode (#1355)
* updated the cli to output html in split-page mode

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

* add pin for new docling-core with html split argument

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* relock with fixed html export in docling-core

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update test results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update more tests

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update example

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update lock with docling-core fixes

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update test results

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add again chunking extras

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-14 08:41:50 +02:00
..
2203.01017v2.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2203.01017v2.json feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2203.01017v2.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2203.01017v2.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
2206.01062.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2206.01062.json feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2206.01062.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2206.01062.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
2305.03393v1-pg9.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.json feat(cli): add option for html with split-page mode (#1355) 2025-04-14 08:41:50 +02:00
2305.03393v1-pg9.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1-pg9.pages.json feat(cli): add option for html with split-page mode (#1355) 2025-04-14 08:41:50 +02:00
2305.03393v1.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1.json feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1.md feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
2305.03393v1.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
amt_handbook_sample.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
amt_handbook_sample.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
amt_handbook_sample.md docs: Add example for inspection of picture content (#624) 2025-01-29 10:39:00 +01:00
amt_handbook_sample.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
code_and_formula.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
picture_classification.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
picture_classification.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
picture_classification.md feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
redp5110_sampled.doctags.txt feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
redp5110_sampled.json feat: Use new TableFormer model weights and default to accurate model version (#1100) 2025-03-11 10:53:49 +01:00
redp5110_sampled.md fix: Proper handling of orphan IDs in layout postprocessing (#1118) 2025-03-05 14:30:59 +01:00
redp5110_sampled.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
right_to_left_01.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
right_to_left_02.doctags.txt fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_02.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00
right_to_left_03.doctags.txt fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_03.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.pages.json feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) 2025-03-18 10:38:19 +01:00