docling/tests/data/groundtruth/docling_v2/picture_classification.json
Christoph Auer cf78d5b7b9
feat: Add content_layer property to items to address body, furniture and other roles (#735)
* feat: Pass predicted page-headers and page-footers through to DoclingDocument furniture

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* chore: Update all test GT

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: update all test cases

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* fix: update all test cases again

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lock

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Update lock to final docling-core

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2025-02-10 12:07:49 +01:00

1 line
11 KiB
JSON

{"schema_name": "DoclingDocument", "version": "1.1.0", "name": "picture_classification", "origin": {"mimetype": "application/pdf", "binary_hash": 6445357065749877499, "filename": "picture_classification.pdf", "uri": null}, "furniture": {"self_ref": "#/furniture", "parent": null, "children": [], "content_layer": "furniture", "name": "_root_", "label": "unspecified"}, "body": {"self_ref": "#/body", "parent": null, "children": [{"cref": "#/texts/0"}, {"cref": "#/texts/1"}, {"cref": "#/texts/2"}, {"cref": "#/pictures/0"}, {"cref": "#/texts/3"}, {"cref": "#/texts/4"}, {"cref": "#/texts/5"}, {"cref": "#/texts/6"}, {"cref": "#/pictures/1"}, {"cref": "#/texts/7"}, {"cref": "#/texts/8"}], "content_layer": "body", "name": "_root_", "label": "unspecified"}, "groups": [], "texts": [{"self_ref": "#/texts/0", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "section_header", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 667.1912231445312, "r": 252.35513305664062, "b": 654.4518432617188, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 15]}], "orig": "Figures Example", "text": "Figures Example", "level": 1}, {"self_ref": "#/texts/1", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 642.3280639648438, "r": 477.4827575683594, "b": 501.97412109375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/2", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "caption", "prov": [{"page_no": 1, "bbox": {"l": 226.89100646972656, "t": 262.86505126953125, "r": 384.35479736328125, "b": 254.0182647705078, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 35]}], "orig": "Figure 1: This is an example image.", "text": "Figure 1: This is an example image."}, {"self_ref": "#/texts/3", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "text", "prov": [{"page_no": 1, "bbox": {"l": 133.76800537109375, "t": 238.95504760742188, "r": 477.4817199707031, "b": 122.51225280761719, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 747]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua."}, {"self_ref": "#/texts/4", "parent": {"cref": "#/body"}, "children": [], "content_layer": "furniture", "label": "page_footer", "prov": [{"page_no": 1, "bbox": {"l": 303.13299560546875, "t": 96.27903747558594, "r": 308.1142883300781, "b": 87.43224334716797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "1", "text": "1"}, {"self_ref": "#/texts/5", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 664.1490478515625, "r": 477.4817199707031, "b": 523.7951049804688, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 887]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet."}, {"self_ref": "#/texts/6", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "caption", "prov": [{"page_no": 2, "bbox": {"l": 226.89100646972656, "t": 268.7890319824219, "r": 384.35479736328125, "b": 259.9422607421875, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 35]}], "orig": "Figure 2: This is an example image.", "text": "Figure 2: This is an example image."}, {"self_ref": "#/texts/7", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "text", "prov": [{"page_no": 2, "bbox": {"l": 133.76800537109375, "t": 245.71804809570312, "r": 477.4817199707031, "b": 117.32023620605469, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 804]}], "orig": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.", "text": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum."}, {"self_ref": "#/texts/8", "parent": {"cref": "#/body"}, "children": [], "content_layer": "furniture", "label": "page_footer", "prov": [{"page_no": 2, "bbox": {"l": 303.13299560546875, "t": 96.27903747558594, "r": 308.1142883300781, "b": 87.43224334716797, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 1]}], "orig": "2", "text": "2"}], "pictures": [{"self_ref": "#/pictures/0", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "picture", "prov": [{"page_no": 1, "bbox": {"l": 134.9200439453125, "t": 487.109375, "r": 475.6635437011719, "b": 281.78173828125, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 35]}], "captions": [{"cref": "#/texts/2"}], "references": [], "footnotes": [], "image": null, "annotations": []}, {"self_ref": "#/pictures/1", "parent": {"cref": "#/body"}, "children": [], "content_layer": "body", "label": "picture", "prov": [{"page_no": 2, "bbox": {"l": 218.8155517578125, "t": 513.984619140625, "r": 391.96246337890625, "b": 283.10589599609375, "coord_origin": "BOTTOMLEFT"}, "charspan": [0, 35]}], "captions": [{"cref": "#/texts/6"}], "references": [], "footnotes": [], "image": null, "annotations": []}], "tables": [], "key_value_items": [], "pages": {"1": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 1}, "2": {"size": {"width": 612.0, "height": 792.0}, "image": null, "page_no": 2}}}