docling/tests/data/groundtruth/docling_v1
Christoph Auer c93e36988f
feat: Implement new reading-order model (#916)
* Implement new reading-order model, replacing DS GLM model (WIP)

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update reading-order model branch

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lockfile [skip ci]

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add captions, footnotes and merges [skip ci]

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updates for reading-order implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Updates for reading-order implementation

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests and lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fixes, update tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add normalization, update tests again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update tests with code

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Push final lockfile

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* sanitize text

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Inlcude furniture, Update tests with furniture

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Fix content_layer assignment

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Delete empty file docling/models/ds_glm_model.py

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com>
2025-02-20 17:51:17 +01:00
..
2203.01017v2.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2203.01017v2.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2203.01017v2.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2203.01017v2.pages.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2206.01062.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2206.01062.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2206.01062.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2206.01062.pages.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2305.03393v1-pg9.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2305.03393v1-pg9.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2305.03393v1-pg9.md feat!: Docling v2 (#117) 2024-10-16 21:02:03 +02:00
2305.03393v1-pg9.pages.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2305.03393v1.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2305.03393v1.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
2305.03393v1.md feat: Updated Layout processing with forms and key-value areas (#530) 2024-12-17 17:32:24 +01:00
2305.03393v1.pages.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
amt_handbook_sample.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
amt_handbook_sample.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
amt_handbook_sample.md docs: Add example for inspection of picture content (#624) 2025-01-29 10:39:00 +01:00
amt_handbook_sample.pages.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
code_and_formula.pages.json fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
picture_classification.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
picture_classification.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
picture_classification.md feat: New document picture classifier (#805) 2025-01-24 18:05:51 +01:00
picture_classification.pages.json fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
redp5110_sampled.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
redp5110_sampled.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
redp5110_sampled.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
redp5110_sampled.pages.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.doctags.txt feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.md feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_01.pages.json fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
right_to_left_02.doctags.txt fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_02.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_02.pages.json fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00
right_to_left_03.doctags.txt fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.json feat: Implement new reading-order model (#916) 2025-02-20 17:51:17 +01:00
right_to_left_03.md fix: Test cases for RTL programmatic PDFs and fixes for the formula model (#903) 2025-02-07 08:43:31 +01:00
right_to_left_03.pages.json fix: Revise DocTags, fix iterate_items to output content_layer in items (#965) 2025-02-17 14:11:55 +01:00