Docling Document
This is an automatic generated API reference of the DoclingDocument type.
doc
Package for models defined by the Document type.
Classes:
-
DoclingDocument
βDoclingDocument.
-
DocumentOrigin
βFileSource.
-
DocItem
βDocItem.
-
DocItemLabel
βDocItemLabel.
-
ProvenanceItem
βProvenanceItem.
-
GroupItem
βGroupItem.
-
GroupLabel
βGroupLabel.
-
NodeItem
βNodeItem.
-
PageItem
βPageItem.
-
FloatingItem
βFloatingItem.
-
TextItem
βTextItem.
-
TableItem
βTableItem.
-
TableCell
βTableCell.
-
TableData
βBaseTableData.
-
TableCellLabel
βTableCellLabel.
-
KeyValueItem
βKeyValueItem.
-
SectionHeaderItem
βSectionItem.
-
PictureItem
βPictureItem.
-
ImageRef
βImageRef.
-
PictureClassificationClass
βPictureClassificationData.
-
PictureClassificationData
βPictureClassificationData.
-
RefItem
βRefItem.
-
BoundingBox
βBoundingBox.
-
CoordOrigin
βCoordOrigin.
-
ImageRefMode
βImageRefMode.
-
Size
βSize.
DoclingDocument
Bases: BaseModel
DoclingDocument.
Methods:
-
add_code
βadd_code.
-
add_document
βAdds the content from the body of a DoclingDocument to this document under a specific parent.
-
add_form
βadd_form.
-
add_formula
βadd_formula.
-
add_group
βadd_group.
-
add_heading
βadd_heading.
-
add_inline_group
βadd_inline_group.
-
add_key_values
βadd_key_values.
-
add_list_group
βadd_list_group.
-
add_list_item
βadd_list_item.
-
add_node_items
βAdds multiple NodeItems and their children under a parent in this document.
-
add_ordered_list
βadd_ordered_list.
-
add_page
βadd_page.
-
add_picture
βadd_picture.
-
add_table
βadd_table.
-
add_text
βadd_text.
-
add_title
βadd_title.
-
add_unordered_list
βadd_unordered_list.
-
append_child_item
βAdds an item.
-
check_version_is_compatible
βCheck if this document version is compatible with SDK schema version.
-
delete_items
βDeletes an item, given its instance or ref, and any children it has.
-
delete_items_range
βDeletes all NodeItems and their children in the range from the start NodeItem to the end NodeItem.
-
export_to_dict
βExport to dict.
-
export_to_doctags
βExports the document content to a DocumentToken format.
-
export_to_document_tokens
βExport to DocTags format.
-
export_to_element_tree
βExport_to_element_tree.
-
export_to_html
βSerialize to HTML.
-
export_to_markdown
βSerialize to Markdown.
-
export_to_text
βexport_to_text.
-
extract_items_range
βExtracts NodeItems and children in the range from the start NodeItem to the end as a new DoclingDocument.
-
get_visualization
βGet visualization of the document as images by page.
-
insert_code
βCreates a new CodeItem item and inserts it into the document.
-
insert_document
βInserts the content from the body of a DoclingDocument into this document at a specific position.
-
insert_form
βCreates a new FormItem item and inserts it into the document.
-
insert_formula
βCreates a new FormulaItem item and inserts it into the document.
-
insert_group
βCreates a new GroupItem item and inserts it into the document.
-
insert_heading
βCreates a new SectionHeaderItem item and inserts it into the document.
-
insert_inline_group
βCreates a new InlineGroup item and inserts it into the document.
-
insert_item_after_sibling
βInserts an item, given its node_item instance, after other as a sibling.
-
insert_item_before_sibling
βInserts an item, given its node_item instance, before other as a sibling.
-
insert_key_values
βCreates a new KeyValueItem item and inserts it into the document.
-
insert_list_group
βCreates a new ListGroup item and inserts it into the document.
-
insert_list_item
βCreates a new ListItem item and inserts it into the document.
-
insert_node_items
βInsert multiple NodeItems and their children at a specific position in the document.
-
insert_picture
βCreates a new PictureItem item and inserts it into the document.
-
insert_table
βCreates a new TableItem item and inserts it into the document.
-
insert_text
βCreates a new TextItem item and inserts it into the document.
-
insert_title
βCreates a new TitleItem item and inserts it into the document.
-
iterate_items
βIterate elements with level.
-
load_from_doctags
βLoad Docling document from lists of DocTags and Images.
-
load_from_json
βload_from_json.
-
load_from_yaml
βload_from_yaml.
-
num_pages
βnum_pages.
-
print_element_tree
βPrint_element_tree.
-
replace_item
βReplace item with new item.
-
save_as_doctags
βSave the document content to DocTags format.
-
save_as_document_tokens
βSave the document content to a DocumentToken format.
-
save_as_html
βSave to HTML.
-
save_as_json
βSave as json.
-
save_as_markdown
βSave to markdown.
-
save_as_yaml
βSave as yaml.
-
transform_to_content_layer
βtransform_to_content_layer.
-
validate_document
βvalidate_document.
-
validate_misplaced_list_items
βvalidate_misplaced_list_items.
-
validate_tree
βvalidate_tree.
Attributes:
-
body
(GroupItem
) β -
form_items
(List[FormItem]
) β -
furniture
(Annotated[GroupItem, Field(deprecated=True)]
) β -
groups
(List[Union[ListGroup, InlineGroup, GroupItem]]
) β -
key_value_items
(List[KeyValueItem]
) β -
name
(str
) β -
origin
(Optional[DocumentOrigin]
) β -
pages
(Dict[int, PageItem]
) β -
pictures
(List[PictureItem]
) β -
schema_name
(Literal['DoclingDocument']
) β -
tables
(List[TableItem]
) β -
texts
(List[Union[TitleItem, SectionHeaderItem, ListItem, CodeItem, FormulaItem, TextItem]]
) β -
version
(Annotated[str, StringConstraints(pattern=VERSION_PATTERN, strict=True)]
) β
form_items
form_items: List[FormItem] = []
furniture
furniture: Annotated[GroupItem, Field(deprecated=True)] = GroupItem(name='_root_', self_ref='#/furniture', content_layer=FURNITURE)
name
name: str
schema_name
schema_name: Literal['DoclingDocument'] = 'DoclingDocument'
texts
texts: List[Union[TitleItem, SectionHeaderItem, ListItem, CodeItem, FormulaItem, TextItem]] = []
version
version: Annotated[str, StringConstraints(pattern=VERSION_PATTERN, strict=True)] = CURRENT_VERSION
add_code
add_code(text: str, code_language: Optional[CodeLanguageLabel] = None, orig: Optional[str] = None, caption: Optional[Union[TextItem, RefItem]] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None)
add_code.
Parameters:
-
text
(str
) βstr:
-
code_language
(Optional[CodeLanguageLabel]
, default:None
) βOptional[str]: (Default value = None)
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
caption
(Optional[Union[TextItem, RefItem]]
, default:None
) βOptional[Union[TextItem:
-
RefItem]]
β(Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_document
add_document(doc: DoclingDocument, parent: Optional[NodeItem] = None) -> None
Adds the content from the body of a DoclingDocument to this document under a specific parent.
Parameters:
-
doc
(DoclingDocument
) βDoclingDocument: The document whose content will be added
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: The parent NodeItem under which new items are added (Default value = None)
Returns:
-
None
βNone
add_form
add_form(graph: GraphData, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None)
add_form.
Parameters:
-
graph
(GraphData
) βGraphData:
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_formula
add_formula(text: str, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None)
add_formula.
Parameters:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
level
βLevelNumber: (Default value = 1)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_group
add_group(label: Optional[GroupLabel] = None, name: Optional[str] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None) -> GroupItem
add_group.
Parameters:
-
label
(Optional[GroupLabel]
, default:None
) βOptional[GroupLabel]: (Default value = None)
-
name
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_heading
add_heading(text: str, orig: Optional[str] = None, level: LevelNumber = 1, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None)
add_heading.
Parameters:
-
label
βDocItemLabel:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
level
(LevelNumber
, default:1
) βLevelNumber: (Default value = 1)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_inline_group
add_inline_group(name: Optional[str] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None) -> InlineGroup
add_inline_group.
add_key_values
add_key_values(graph: GraphData, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None)
add_key_values.
Parameters:
-
graph
(GraphData
) βGraphData:
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_list_group
add_list_group(name: Optional[str] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None) -> ListGroup
add_list_group.
add_list_item
add_list_item(text: str, enumerated: bool = False, marker: Optional[str] = None, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None)
add_list_item.
Parameters:
-
label
βstr:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_node_items
add_node_items(node_items: List[NodeItem], doc: DoclingDocument, parent: Optional[NodeItem] = None) -> None
Adds multiple NodeItems and their children under a parent in this document.
Parameters:
-
node_items
(List[NodeItem]
) βlist[NodeItem]: The NodeItems to be added
-
doc
(DoclingDocument
) βDoclingDocument: The document to which the NodeItems and their children belong
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: The parent NodeItem under which new items are added (Default value = None)
Returns:
-
None
βNone
add_ordered_list
add_ordered_list(name: Optional[str] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None) -> GroupItem
add_ordered_list.
add_page
add_picture
add_picture(annotations: Optional[List[PictureDataType]] = None, image: Optional[ImageRef] = None, caption: Optional[Union[TextItem, RefItem]] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None)
add_picture.
Parameters:
-
data
βOptional[List[PictureData]]: (Default value = None)
-
caption
(Optional[Union[TextItem, RefItem]]
, default:None
) βOptional[Union[TextItem:
-
RefItem]]
β(Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_table
add_table(data: TableData, caption: Optional[Union[TextItem, RefItem]] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, label: DocItemLabel = TABLE, content_layer: Optional[ContentLayer] = None, annotations: Optional[list[TableAnnotationType]] = None)
add_table.
Parameters:
-
data
(TableData
) βTableData:
-
caption
(Optional[Union[TextItem, RefItem]]
, default:None
) βOptional[Union[TextItem, RefItem]]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
-
label
(DocItemLabel
, default:TABLE
) βDocItemLabel: (Default value = DocItemLabel.TABLE)
add_text
add_text(label: DocItemLabel, text: str, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None)
add_text.
Parameters:
-
label
(DocItemLabel
) βstr:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_title
add_title(text: str, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None)
add_title.
Parameters:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
level
βLevelNumber: (Default value = 1)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
parent
(Optional[NodeItem]
, default:None
) βOptional[NodeItem]: (Default value = None)
add_unordered_list
add_unordered_list(name: Optional[str] = None, parent: Optional[NodeItem] = None, content_layer: Optional[ContentLayer] = None) -> GroupItem
add_unordered_list.
append_child_item
Adds an item.
check_version_is_compatible
check_version_is_compatible(v: str) -> str
Check if this document version is compatible with SDK schema version.
delete_items
delete_items(*, node_items: List[NodeItem]) -> None
Deletes an item, given its instance or ref, and any children it has.
delete_items_range
delete_items_range(*, start: NodeItem, end: NodeItem, start_inclusive: bool = True, end_inclusive: bool = True) -> None
Deletes all NodeItems and their children in the range from the start NodeItem to the end NodeItem.
Parameters:
-
start
(NodeItem
) βNodeItem: The starting NodeItem of the range
-
end
(NodeItem
) βNodeItem: The ending NodeItem of the range
-
start_inclusive
(bool
, default:True
) βbool: (Default value = True): If True, the start NodeItem will also be deleted
-
end_inclusive
(bool
, default:True
) βbool: (Default value = True): If True, the end NodeItem will also be deleted
Returns:
-
None
βNone
export_to_dict
export_to_dict(mode: str = 'json', by_alias: bool = True, exclude_none: bool = True, coord_precision: Optional[int] = None, confid_precision: Optional[int] = None) -> Dict[str, Any]
Export to dict.
export_to_doctags
export_to_doctags(delim: str = '', from_element: int = 0, to_element: int = maxsize, labels: Optional[set[DocItemLabel]] = None, xsize: int = 500, ysize: int = 500, add_location: bool = True, add_content: bool = True, add_page_index: bool = True, add_table_cell_location: bool = False, add_table_cell_text: bool = True, minified: bool = False, pages: Optional[set[int]] = None) -> str
Exports the document content to a DocumentToken format.
Operates on a slice of the document's body as defined through arguments from_element and to_element; defaulting to the whole main_text.
Parameters:
-
delim
(str
, default:''
) βstr: (Default value = "") Deprecated
-
from_element
(int
, default:0
) βint: (Default value = 0)
-
to_element
(int
, default:maxsize
) βOptional[int]: (Default value = None)
-
labels
(Optional[set[DocItemLabel]]
, default:None
) βset[DocItemLabel]
-
xsize
(int
, default:500
) βint: (Default value = 500)
-
ysize
(int
, default:500
) βint: (Default value = 500)
-
add_location
(bool
, default:True
) βbool: (Default value = True)
-
add_content
(bool
, default:True
) βbool: (Default value = True)
-
add_page_index
(bool
, default:True
) βbool: (Default value = True)
-
add_table_cell_text
(bool
, default:True
) βbool: (Default value = True)
-
minified
(bool
, default:False
) βbool: (Default value = False)
-
pages
(Optional[set[int]]
, default:None
) βset[int]: (Default value = None)
Returns:
-
str
βThe content of the document formatted as a DocTags string.
export_to_document_tokens
export_to_document_tokens(*args, **kwargs)
Export to DocTags format.
export_to_element_tree
export_to_element_tree() -> str
Export_to_element_tree.
export_to_html
export_to_html(from_element: int = 0, to_element: int = maxsize, labels: Optional[set[DocItemLabel]] = None, enable_chart_tables: bool = True, image_mode: ImageRefMode = PLACEHOLDER, formula_to_mathml: bool = True, page_no: Optional[int] = None, html_lang: str = 'en', html_head: str = 'null', included_content_layers: Optional[set[ContentLayer]] = None, split_page_view: bool = False, include_annotations: bool = True) -> str
Serialize to HTML.
export_to_markdown
export_to_markdown(delim: str = '\n\n', from_element: int = 0, to_element: int = maxsize, labels: Optional[set[DocItemLabel]] = None, strict_text: bool = False, escape_underscores: bool = True, image_placeholder: str = '<!-- image -->', enable_chart_tables: bool = True, image_mode: ImageRefMode = PLACEHOLDER, indent: int = 4, text_width: int = -1, page_no: Optional[int] = None, included_content_layers: Optional[set[ContentLayer]] = None, page_break_placeholder: Optional[str] = None, include_annotations: bool = True, mark_annotations: bool = False) -> str
Serialize to Markdown.
Operates on a slice of the document's body as defined through arguments from_element and to_element; defaulting to the whole document.
Parameters:
-
delim
(str
, default:'\n\n'
) βDeprecated.
-
from_element
(int
, default:0
) βBody slicing start index (inclusive). (Default value = 0).
-
to_element
(int
, default:maxsize
) βBody slicing stop index (exclusive). (Default value = maxint).
-
labels
(Optional[set[DocItemLabel]]
, default:None
) βThe set of document labels to include in the export. None falls back to the system-defined default.
-
strict_text
(bool
, default:False
) βDeprecated.
-
escape_underscores
(bool
, default:True
) βbool: Whether to escape underscores in the text content of the document. (Default value = True).
-
image_placeholder
(str
, default:'<!-- image -->'
) βThe placeholder to include to position images in the markdown. (Default value = "\<!-- image -->").
-
image_mode
(ImageRefMode
, default:PLACEHOLDER
) βThe mode to use for including images in the markdown. (Default value = ImageRefMode.PLACEHOLDER).
-
indent
(int
, default:4
) βThe indent in spaces of the nested lists. (Default value = 4).
-
included_content_layers
(Optional[set[ContentLayer]]
, default:None
) βThe set of layels to include in the export. None falls back to the system-defined default.
-
page_break_placeholder
(Optional[str]
, default:None
) βThe placeholder to include for marking page breaks. None means no page break placeholder will be used.
-
include_annotations
(bool
, default:True
) βbool: Whether to include annotations in the export. (Default value = True).
-
mark_annotations
(bool
, default:False
) βbool: Whether to mark annotations in the export; only relevant if include_annotations is True. (Default value = False).
Returns:
-
str
βThe exported Markdown representation.
export_to_text
export_to_text(delim: str = '\n\n', from_element: int = 0, to_element: int = 1000000, labels: Optional[set[DocItemLabel]] = None) -> str
export_to_text.
extract_items_range
extract_items_range(*, start: NodeItem, end: NodeItem, start_inclusive: bool = True, end_inclusive: bool = True, delete: bool = False) -> DoclingDocument
Extracts NodeItems and children in the range from the start NodeItem to the end as a new DoclingDocument.
Parameters:
-
start
(NodeItem
) βNodeItem: The starting NodeItem of the range (must be a direct child of the document body)
-
end
(NodeItem
) βNodeItem: The ending NodeItem of the range (must be a direct child of the document body)
-
start_inclusive
(bool
, default:True
) βbool: (Default value = True): If True, the start NodeItem will also be extracted
-
end_inclusive
(bool
, default:True
) βbool: (Default value = True): If True, the end NodeItem will also be extracted
-
delete
(bool
, default:False
) βbool: (Default value = False): If True, extracted items are deleted in the original document
Returns:
-
DoclingDocument
βDoclingDocument: A new document containing the extracted NodeItems and their children
get_visualization
get_visualization(show_label: bool = True, show_branch_numbering: bool = False) -> dict[Optional[int], Image]
Get visualization of the document as images by page.
insert_code
insert_code(sibling: NodeItem, text: str, code_language: Optional[CodeLanguageLabel] = None, orig: Optional[str] = None, caption: Optional[Union[TextItem, RefItem]] = None, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None, after: bool = True) -> CodeItem
Creates a new CodeItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
text
(str
) βstr:
-
code_language
(Optional[CodeLanguageLabel]
, default:None
) βOptional[str]: (Default value = None)
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
caption
(Optional[Union[TextItem, RefItem]]
, default:None
) βOptional[Union[TextItem, RefItem]]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
formatting
(Optional[Formatting]
, default:None
) βOptional[Formatting]: (Default value = None)
-
hyperlink
(Optional[Union[AnyUrl, Path]]
, default:None
) βOptional[Union[AnyUrl, Path]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
CodeItem
βCodeItem: The newly created CodeItem item.
insert_document
insert_document(doc: DoclingDocument, sibling: NodeItem, after: bool = True) -> None
Inserts the content from the body of a DoclingDocument into this document at a specific position.
Parameters:
-
doc
(DoclingDocument
) βDoclingDocument: The document whose content will be inserted
-
sibling
(NodeItem
) βNodeItem: The NodeItem after/before which the new items will be inserted
-
after
(bool
, default:True
) βbool: If True, insert after the sibling; if False, insert before (Default value = True)
Returns:
-
None
βNone
insert_form
insert_form(sibling: NodeItem, graph: GraphData, prov: Optional[ProvenanceItem] = None, after: bool = True) -> FormItem
Creates a new FormItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
graph
(GraphData
) βGraphData:
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
FormItem
βFormItem: The newly created FormItem item.
insert_formula
insert_formula(sibling: NodeItem, text: str, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None, after: bool = True) -> FormulaItem
Creates a new FormulaItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
formatting
(Optional[Formatting]
, default:None
) βOptional[Formatting]: (Default value = None)
-
hyperlink
(Optional[Union[AnyUrl, Path]]
, default:None
) βOptional[Union[AnyUrl, Path]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
FormulaItem
βFormulaItem: The newly created FormulaItem item.
insert_group
insert_group(sibling: NodeItem, label: Optional[GroupLabel] = None, name: Optional[str] = None, content_layer: Optional[ContentLayer] = None, after: bool = True) -> GroupItem
Creates a new GroupItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
label
(Optional[GroupLabel]
, default:None
) βOptional[GroupLabel]: (Default value = None)
-
name
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
GroupItem
βGroupItem: The newly created GroupItem.
insert_heading
insert_heading(sibling: NodeItem, text: str, orig: Optional[str] = None, level: LevelNumber = 1, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None, after: bool = True) -> SectionHeaderItem
Creates a new SectionHeaderItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
level
(LevelNumber
, default:1
) βLevelNumber: (Default value = 1)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
formatting
(Optional[Formatting]
, default:None
) βOptional[Formatting]: (Default value = None)
-
hyperlink
(Optional[Union[AnyUrl, Path]]
, default:None
) βOptional[Union[AnyUrl, Path]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
SectionHeaderItem
βSectionHeaderItem: The newly created SectionHeaderItem item.
insert_inline_group
insert_inline_group(sibling: NodeItem, name: Optional[str] = None, content_layer: Optional[ContentLayer] = None, after: bool = True) -> InlineGroup
Creates a new InlineGroup item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
name
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
InlineGroup
βInlineGroup: The newly created InlineGroup item.
insert_item_after_sibling
Inserts an item, given its node_item instance, after other as a sibling.
insert_item_before_sibling
Inserts an item, given its node_item instance, before other as a sibling.
insert_key_values
insert_key_values(sibling: NodeItem, graph: GraphData, prov: Optional[ProvenanceItem] = None, after: bool = True) -> KeyValueItem
Creates a new KeyValueItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
graph
(GraphData
) βGraphData:
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
KeyValueItem
βKeyValueItem: The newly created KeyValueItem item.
insert_list_group
insert_list_group(sibling: NodeItem, name: Optional[str] = None, content_layer: Optional[ContentLayer] = None, after: bool = True) -> ListGroup
Creates a new ListGroup item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
name
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
ListGroup
βListGroup: The newly created ListGroup item.
insert_list_item
insert_list_item(sibling: NodeItem, text: str, enumerated: bool = False, marker: Optional[str] = None, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None, after: bool = True) -> ListItem
Creates a new ListItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
text
(str
) βstr:
-
enumerated
(bool
, default:False
) βbool: (Default value = False)
-
marker
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
formatting
(Optional[Formatting]
, default:None
) βOptional[Formatting]: (Default value = None)
-
hyperlink
(Optional[Union[AnyUrl, Path]]
, default:None
) βOptional[Union[AnyUrl, Path]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
ListItem
βListItem: The newly created ListItem item.
insert_node_items
insert_node_items(sibling: NodeItem, node_items: List[NodeItem], doc: DoclingDocument, after: bool = True) -> None
Insert multiple NodeItems and their children at a specific position in the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem: The NodeItem after/before which the new items will be inserted
-
node_items
(List[NodeItem]
) βlist[NodeItem]: The NodeItems to be inserted
-
doc
(DoclingDocument
) βDoclingDocument: The document to which the NodeItems and their children belong
-
after
(bool
, default:True
) βbool: If True, insert after the sibling; if False, insert before (Default value = True)
Returns:
-
None
βNone
insert_picture
insert_picture(sibling: NodeItem, annotations: Optional[List[PictureDataType]] = None, image: Optional[ImageRef] = None, caption: Optional[Union[TextItem, RefItem]] = None, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, after: bool = True) -> PictureItem
Creates a new PictureItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
annotations
(Optional[List[PictureDataType]]
, default:None
) βOptional[List[PictureDataType]]: (Default value = None)
-
image
(Optional[ImageRef]
, default:None
) βOptional[ImageRef]: (Default value = None)
-
caption
(Optional[Union[TextItem, RefItem]]
, default:None
) βOptional[Union[TextItem, RefItem]]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
PictureItem
βPictureItem: The newly created PictureItem item.
insert_table
insert_table(sibling: NodeItem, data: TableData, caption: Optional[Union[TextItem, RefItem]] = None, prov: Optional[ProvenanceItem] = None, label: DocItemLabel = TABLE, content_layer: Optional[ContentLayer] = None, annotations: Optional[list[TableAnnotationType]] = None, after: bool = True) -> TableItem
Creates a new TableItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
data
(TableData
) βTableData:
-
caption
(Optional[Union[TextItem, RefItem]]
, default:None
) βOptional[Union[TextItem, RefItem]]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
label
(DocItemLabel
, default:TABLE
) βDocItemLabel: (Default value = DocItemLabel.TABLE)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
annotations
(Optional[list[TableAnnotationType]]
, default:None
) βOptional[List[TableAnnotationType]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
TableItem
βTableItem: The newly created TableItem item.
insert_text
insert_text(sibling: NodeItem, label: DocItemLabel, text: str, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None, after: bool = True) -> TextItem
Creates a new TextItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
label
(DocItemLabel
) βDocItemLabel:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
formatting
(Optional[Formatting]
, default:None
) βOptional[Formatting]: (Default value = None)
-
hyperlink
(Optional[Union[AnyUrl, Path]]
, default:None
) βOptional[Union[AnyUrl, Path]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
TextItem
βTextItem: The newly created TextItem item.
insert_title
insert_title(sibling: NodeItem, text: str, orig: Optional[str] = None, prov: Optional[ProvenanceItem] = None, content_layer: Optional[ContentLayer] = None, formatting: Optional[Formatting] = None, hyperlink: Optional[Union[AnyUrl, Path]] = None, after: bool = True) -> TitleItem
Creates a new TitleItem item and inserts it into the document.
Parameters:
-
sibling
(NodeItem
) βNodeItem:
-
text
(str
) βstr:
-
orig
(Optional[str]
, default:None
) βOptional[str]: (Default value = None)
-
prov
(Optional[ProvenanceItem]
, default:None
) βOptional[ProvenanceItem]: (Default value = None)
-
content_layer
(Optional[ContentLayer]
, default:None
) βOptional[ContentLayer]: (Default value = None)
-
formatting
(Optional[Formatting]
, default:None
) βOptional[Formatting]: (Default value = None)
-
hyperlink
(Optional[Union[AnyUrl, Path]]
, default:None
) βOptional[Union[AnyUrl, Path]]: (Default value = None)
-
after
(bool
, default:True
) βbool: (Default value = True)
Returns:
-
TitleItem
βTitleItem: The newly created TitleItem item.
iterate_items
iterate_items(root: Optional[NodeItem] = None, with_groups: bool = False, traverse_pictures: bool = False, page_no: Optional[int] = None, included_content_layers: Optional[set[ContentLayer]] = None, _level: int = 0) -> Iterable[Tuple[NodeItem, int]]
Iterate elements with level.
load_from_doctags
load_from_doctags(doctag_document: DocTagsDocument, document_name: str = 'Document') -> DoclingDocument
Load Docling document from lists of DocTags and Images.
load_from_json
load_from_json(filename: Union[str, Path]) -> DoclingDocument
load_from_json.
Parameters:
-
filename
(Union[str, Path]
) βThe filename to load a saved DoclingDocument from a .json.
Returns:
-
DoclingDocument
βThe loaded DoclingDocument.
load_from_yaml
load_from_yaml(filename: Union[str, Path]) -> DoclingDocument
load_from_yaml.
Args: filename: The filename to load a YAML-serialized DoclingDocument from.
Returns: DoclingDocument: the loaded DoclingDocument
num_pages
num_pages()
num_pages.
print_element_tree
print_element_tree()
Print_element_tree.
replace_item
Replace item with new item.
save_as_doctags
save_as_doctags(filename: Union[str, Path], delim: str = '', from_element: int = 0, to_element: int = maxsize, labels: Optional[set[DocItemLabel]] = None, xsize: int = 500, ysize: int = 500, add_location: bool = True, add_content: bool = True, add_page_index: bool = True, add_table_cell_location: bool = False, add_table_cell_text: bool = True, minified: bool = False)
Save the document content to DocTags format.
save_as_document_tokens
save_as_document_tokens(*args, **kwargs)
Save the document content to a DocumentToken format.
save_as_html
save_as_html(filename: Union[str, Path], artifacts_dir: Optional[Path] = None, from_element: int = 0, to_element: int = maxsize, labels: Optional[set[DocItemLabel]] = None, image_mode: ImageRefMode = PLACEHOLDER, formula_to_mathml: bool = True, page_no: Optional[int] = None, html_lang: str = 'en', html_head: str = 'null', included_content_layers: Optional[set[ContentLayer]] = None, split_page_view: bool = False, include_annotations: bool = True)
Save to HTML.
save_as_json
save_as_json(filename: Union[str, Path], artifacts_dir: Optional[Path] = None, image_mode: ImageRefMode = EMBEDDED, indent: int = 2, coord_precision: Optional[int] = None, confid_precision: Optional[int] = None)
Save as json.
save_as_markdown
save_as_markdown(filename: Union[str, Path], artifacts_dir: Optional[Path] = None, delim: str = '\n\n', from_element: int = 0, to_element: int = maxsize, labels: Optional[set[DocItemLabel]] = None, strict_text: bool = False, escaping_underscores: bool = True, image_placeholder: str = '<!-- image -->', image_mode: ImageRefMode = PLACEHOLDER, indent: int = 4, text_width: int = -1, page_no: Optional[int] = None, included_content_layers: Optional[set[ContentLayer]] = None, page_break_placeholder: Optional[str] = None, include_annotations: bool = True)
Save to markdown.
save_as_yaml
save_as_yaml(filename: Union[str, Path], artifacts_dir: Optional[Path] = None, image_mode: ImageRefMode = EMBEDDED, default_flow_style: bool = False, coord_precision: Optional[int] = None, confid_precision: Optional[int] = None)
Save as yaml.
transform_to_content_layer
transform_to_content_layer(data: dict) -> dict
transform_to_content_layer.
validate_misplaced_list_items
validate_misplaced_list_items()
validate_misplaced_list_items.
validate_tree
validate_tree(root) -> bool
validate_tree.
DocumentOrigin
Bases: BaseModel
FileSource.
Methods:
-
parse_hex_string
βparse_hex_string.
-
validate_mimetype
βvalidate_mimetype.
Attributes:
-
binary_hash
(Uint64
) β -
filename
(str
) β -
mimetype
(str
) β -
uri
(Optional[AnyUrl]
) β
binary_hash
binary_hash: Uint64
filename
filename: str
mimetype
mimetype: str
uri
uri: Optional[AnyUrl] = None
parse_hex_string
parse_hex_string(value)
parse_hex_string.
validate_mimetype
validate_mimetype(v)
validate_mimetype.
DocItem
Bases: NodeItem
DocItem.
Methods:
-
get_annotations
βGet the annotations of this DocItem.
-
get_image
βReturns the image of this DocItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
label
(DocItemLabel
) β -
model_config
β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
self_ref
(str
) β
content_layer
content_layer: ContentLayer = BODY
model_config
model_config = ConfigDict(extra='forbid')
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this DocItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image of this DocItem.
The function returns None if this DocItem has no valid provenance or if a valid image of the page containing this DocItem is not available in doc.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
DocItemLabel
Bases: str
, Enum
DocItemLabel.
Methods:
-
get_color
βReturn the RGB color associated with a given label.
Attributes:
-
CAPTION
β -
CHART
β -
CHECKBOX_SELECTED
β -
CHECKBOX_UNSELECTED
β -
CODE
β -
DOCUMENT_INDEX
β -
EMPTY_VALUE
β -
FOOTNOTE
β -
FORM
β -
FORMULA
β -
GRADING_SCALE
β -
HANDWRITTEN_TEXT
β -
KEY_VALUE_REGION
β -
LIST_ITEM
β -
PAGE_FOOTER
β -
PAGE_HEADER
β -
PARAGRAPH
β -
PICTURE
β -
REFERENCE
β -
SECTION_HEADER
β -
TABLE
β -
TEXT
β -
TITLE
β
CAPTION
CAPTION = 'caption'
CHART
CHART = 'chart'
CHECKBOX_SELECTED
CHECKBOX_SELECTED = 'checkbox_selected'
CHECKBOX_UNSELECTED
CHECKBOX_UNSELECTED = 'checkbox_unselected'
CODE
CODE = 'code'
DOCUMENT_INDEX
DOCUMENT_INDEX = 'document_index'
EMPTY_VALUE
EMPTY_VALUE = 'empty_value'
FOOTNOTE
FOOTNOTE = 'footnote'
FORM
FORM = 'form'
FORMULA
FORMULA = 'formula'
GRADING_SCALE
GRADING_SCALE = 'grading_scale'
HANDWRITTEN_TEXT
HANDWRITTEN_TEXT = 'handwritten_text'
KEY_VALUE_REGION
KEY_VALUE_REGION = 'key_value_region'
LIST_ITEM
LIST_ITEM = 'list_item'
PAGE_FOOTER
PAGE_FOOTER = 'page_footer'
PAGE_HEADER
PAGE_HEADER = 'page_header'
PARAGRAPH
PARAGRAPH = 'paragraph'
PICTURE
PICTURE = 'picture'
REFERENCE
REFERENCE = 'reference'
SECTION_HEADER
SECTION_HEADER = 'section_header'
TABLE
TABLE = 'table'
TEXT
TEXT = 'text'
TITLE
TITLE = 'title'
get_color
get_color(label: DocItemLabel) -> Tuple[int, int, int]
Return the RGB color associated with a given label.
ProvenanceItem
Bases: BaseModel
ProvenanceItem.
Attributes:
-
bbox
(BoundingBox
) β -
charspan
(Tuple[int, int]
) β -
page_no
(int
) β
GroupItem
Bases: NodeItem
GroupItem.
Methods:
-
get_ref
βget_ref.
Attributes:
-
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
label
(GroupLabel
) β -
model_config
β -
name
(str
) β -
parent
(Optional[RefItem]
) β -
self_ref
(str
) β
content_layer
content_layer: ContentLayer = BODY
model_config
model_config = ConfigDict(extra='forbid')
name
name: str = 'group'
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
GroupLabel
Bases: str
, Enum
GroupLabel.
Attributes:
-
CHAPTER
β -
COMMENT_SECTION
β -
FORM_AREA
β -
INLINE
β -
KEY_VALUE_AREA
β -
LIST
β -
ORDERED_LIST
β -
PICTURE_AREA
β -
SECTION
β -
SHEET
β -
SLIDE
β -
UNSPECIFIED
β
CHAPTER
CHAPTER = 'chapter'
COMMENT_SECTION
COMMENT_SECTION = 'comment_section'
FORM_AREA
FORM_AREA = 'form_area'
INLINE
INLINE = 'inline'
KEY_VALUE_AREA
KEY_VALUE_AREA = 'key_value_area'
LIST
LIST = 'list'
ORDERED_LIST
ORDERED_LIST = 'ordered_list'
PICTURE_AREA
PICTURE_AREA = 'picture_area'
SECTION
SECTION = 'section'
SHEET
SHEET = 'sheet'
SLIDE
SLIDE = 'slide'
UNSPECIFIED
UNSPECIFIED = 'unspecified'
NodeItem
Bases: BaseModel
NodeItem.
Methods:
-
get_ref
βget_ref.
Attributes:
-
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
model_config
β -
parent
(Optional[RefItem]
) β -
self_ref
(str
) β
PageItem
FloatingItem
Bases: DocItem
FloatingItem.
Methods:
-
caption_text
βComputes the caption as a single text.
-
get_annotations
βGet the annotations of this DocItem.
-
get_image
βReturns the image corresponding to this FloatingItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
captions
(List[RefItem]
) β -
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
footnotes
(List[RefItem]
) β -
image
(Optional[ImageRef]
) β -
label
(DocItemLabel
) β -
model_config
β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
references
(List[RefItem]
) β -
self_ref
(str
) β
content_layer
content_layer: ContentLayer = BODY
model_config
model_config = ConfigDict(extra='forbid')
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this DocItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image corresponding to this FloatingItem.
This function returns the PIL image from self.image if one is available. Otherwise, it uses DocItem.get_image to get an image of this FloatingItem.
In particular, when self.image is None, the function returns None if this FloatingItem has no valid provenance or the doc does not contain a valid image for the required page.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
TextItem
Bases: DocItem
TextItem.
Methods:
-
export_to_doctags
βExport text element to document tokens format.
-
export_to_document_tokens
βExport to DocTags format.
-
get_annotations
βGet the annotations of this DocItem.
-
get_image
βReturns the image of this DocItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
formatting
(Optional[Formatting]
) β -
hyperlink
(Optional[Union[AnyUrl, Path]]
) β -
label
(Literal[CAPTION, CHECKBOX_SELECTED, CHECKBOX_UNSELECTED, FOOTNOTE, PAGE_FOOTER, PAGE_HEADER, PARAGRAPH, REFERENCE, TEXT, EMPTY_VALUE]
) β -
model_config
β -
orig
(str
) β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
self_ref
(str
) β -
text
(str
) β
content_layer
content_layer: ContentLayer = BODY
formatting
formatting: Optional[Formatting] = None
hyperlink
hyperlink: Optional[Union[AnyUrl, Path]] = Field(union_mode='left_to_right', default=None)
label
label: Literal[CAPTION, CHECKBOX_SELECTED, CHECKBOX_UNSELECTED, FOOTNOTE, PAGE_FOOTER, PAGE_HEADER, PARAGRAPH, REFERENCE, TEXT, EMPTY_VALUE]
model_config
model_config = ConfigDict(extra='forbid')
orig
orig: str
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
text
text: str
export_to_doctags
export_to_doctags(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500, add_location: bool = True, add_content: bool = True)
Export text element to document tokens format.
Parameters:
-
doc
(DoclingDocument
) β"DoclingDocument":
-
new_line
(str
, default:''
) βstr (Default value = "") Deprecated
-
xsize
(int
, default:500
) βint: (Default value = 500)
-
ysize
(int
, default:500
) βint: (Default value = 500)
-
add_location
(bool
, default:True
) βbool: (Default value = True)
-
add_content
(bool
, default:True
) βbool: (Default value = True)
export_to_document_tokens
export_to_document_tokens(*args, **kwargs)
Export to DocTags format.
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this DocItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image of this DocItem.
The function returns None if this DocItem has no valid provenance or if a valid image of the page containing this DocItem is not available in doc.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
TableItem
Bases: FloatingItem
TableItem.
Methods:
-
add_annotation
βAdd an annotation to the table.
-
caption_text
βComputes the caption as a single text.
-
export_to_dataframe
βExport the table as a Pandas DataFrame.
-
export_to_doctags
βExport table to document tokens format.
-
export_to_document_tokens
βExport to DocTags format.
-
export_to_html
βExport the table as html.
-
export_to_markdown
βExport the table as markdown.
-
export_to_otsl
βExport the table as OTSL.
-
get_annotations
βGet the annotations of this TableItem.
-
get_image
βReturns the image corresponding to this FloatingItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
annotations
(List[TableAnnotationType]
) β -
captions
(List[RefItem]
) β -
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
data
(TableData
) β -
footnotes
(List[RefItem]
) β -
image
(Optional[ImageRef]
) β -
label
(Literal[DOCUMENT_INDEX, TABLE]
) β -
model_config
β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
references
(List[RefItem]
) β -
self_ref
(str
) β
annotations
annotations: List[TableAnnotationType] = []
content_layer
content_layer: ContentLayer = BODY
model_config
model_config = ConfigDict(extra='forbid')
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
add_annotation
add_annotation(annotation: TableAnnotationType) -> None
Add an annotation to the table.
export_to_dataframe
export_to_dataframe() -> DataFrame
Export the table as a Pandas DataFrame.
export_to_doctags
export_to_doctags(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500, add_location: bool = True, add_cell_location: bool = True, add_cell_text: bool = True, add_caption: bool = True)
Export table to document tokens format.
Parameters:
-
doc
(DoclingDocument
) β"DoclingDocument":
-
new_line
(str
, default:''
) βstr (Default value = "") Deprecated
-
xsize
(int
, default:500
) βint: (Default value = 500)
-
ysize
(int
, default:500
) βint: (Default value = 500)
-
add_location
(bool
, default:True
) βbool: (Default value = True)
-
add_cell_location
(bool
, default:True
) βbool: (Default value = True)
-
add_cell_text
(bool
, default:True
) βbool: (Default value = True)
-
add_caption
(bool
, default:True
) βbool: (Default value = True)
export_to_document_tokens
export_to_document_tokens(*args, **kwargs)
Export to DocTags format.
export_to_html
export_to_html(doc: Optional[DoclingDocument] = None, add_caption: bool = True) -> str
Export the table as html.
export_to_markdown
export_to_markdown(doc: Optional[DoclingDocument] = None) -> str
Export the table as markdown.
export_to_otsl
export_to_otsl(doc: DoclingDocument, add_cell_location: bool = True, add_cell_text: bool = True, xsize: int = 500, ysize: int = 500) -> str
Export the table as OTSL.
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this TableItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image corresponding to this FloatingItem.
This function returns the PIL image from self.image if one is available. Otherwise, it uses DocItem.get_image to get an image of this FloatingItem.
In particular, when self.image is None, the function returns None if this FloatingItem has no valid provenance or the doc does not contain a valid image for the required page.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
TableCell
Bases: BaseModel
TableCell.
Methods:
-
from_dict_format
βfrom_dict_format.
Attributes:
-
bbox
(Optional[BoundingBox]
) β -
col_span
(int
) β -
column_header
(bool
) β -
end_col_offset_idx
(int
) β -
end_row_offset_idx
(int
) β -
row_header
(bool
) β -
row_section
(bool
) β -
row_span
(int
) β -
start_col_offset_idx
(int
) β -
start_row_offset_idx
(int
) β -
text
(str
) β
col_span
col_span: int = 1
column_header
column_header: bool = False
end_col_offset_idx
end_col_offset_idx: int
end_row_offset_idx
end_row_offset_idx: int
row_header
row_header: bool = False
row_section
row_section: bool = False
row_span
row_span: int = 1
start_col_offset_idx
start_col_offset_idx: int
start_row_offset_idx
start_row_offset_idx: int
text
text: str
from_dict_format
from_dict_format(data: Any) -> Any
from_dict_format.
TableData
Bases: BaseModel
BaseTableData.
Methods:
-
add_row
βAdd a new row to the table from a list of strings.
-
add_rows
βAdd multiple new rows to the table from a list of lists of strings.
-
get_column_bounding_boxes
βGet the minimal bounding box for each column in the table.
-
get_row_bounding_boxes
βGet the minimal bounding box for each row in the table.
-
insert_row
βInsert a new row from a list of strings before/after a specific index in the table.
-
insert_rows
βInsert multiple new rows from a list of lists of strings before/after a specific index in the table.
-
pop_row
βRemove and return the last row from the table.
-
remove_row
βRemove a row from the table by its index.
-
remove_rows
βRemove rows from the table by their indices.
Attributes:
-
grid
(List[List[TableCell]]
) βgrid.
-
num_cols
(int
) β -
num_rows
(int
) β -
table_cells
(List[TableCell]
) β
num_cols
num_cols: int = 0
num_rows
num_rows: int = 0
add_row
add_row(row: List[str]) -> None
Add a new row to the table from a list of strings.
Parameters:
-
row
(List[str]
) βList[str]: A list of strings representing the content of the new row.
Returns:
-
None
βNone
add_rows
add_rows(rows: List[List[str]]) -> None
Add multiple new rows to the table from a list of lists of strings.
Parameters:
-
rows
(List[List[str]]
) βList[List[str]]: A list of lists, where each inner list represents the content of a new row.
Returns:
-
None
βNone
get_column_bounding_boxes
get_column_bounding_boxes() -> dict[int, BoundingBox]
Get the minimal bounding box for each column in the table.
Returns: List[Optional[BoundingBox]]: A list where each element is the minimal bounding box that encompasses all cells in that column, or None if no cells in the column have bounding boxes.
get_row_bounding_boxes
get_row_bounding_boxes() -> dict[int, BoundingBox]
Get the minimal bounding box for each row in the table.
Returns: List[Optional[BoundingBox]]: A list where each element is the minimal bounding box that encompasses all cells in that row, or None if no cells in the row have bounding boxes.
insert_row
insert_row(row_index: int, row: List[str], after: bool = False) -> None
Insert a new row from a list of strings before/after a specific index in the table.
Parameters:
-
row_index
(int
) βint: The index at which to insert the new row. (Starting from 0)
-
row
(List[str]
) βList[str]: A list of strings representing the content of the new row.
-
after
(bool
, default:False
) βbool: If True, insert the row after the specified index, otherwise before it. (Default is False)
Returns:
-
None
βNone
insert_rows
insert_rows(row_index: int, rows: List[List[str]], after: bool = False) -> None
Insert multiple new rows from a list of lists of strings before/after a specific index in the table.
Parameters:
-
row_index
(int
) βint: The index at which to insert the new rows. (Starting from 0)
-
rows
(List[List[str]]
) βList[List[str]]: A list of lists, where each inner list represents the content of a new row.
-
after
(bool
, default:False
) βbool: If True, insert the rows after the specified index, otherwise before it. (Default is False)
Returns:
-
None
βNone
pop_row
pop_row() -> List[TableCell]
Remove and return the last row from the table.
Returns:
-
List[TableCell]
βList[TableCell]: A list of TableCell objects representing the popped row.
remove_row
remove_row(row_index: int) -> List[TableCell]
Remove a row from the table by its index.
Parameters:
-
row_index
(int
) βint: The index of the row to remove. (Starting from 0)
Returns:
-
List[TableCell]
βList[TableCell]: A list of TableCell objects representing the removed row.
remove_rows
remove_rows(indices: List[int]) -> List[List[TableCell]]
Remove rows from the table by their indices.
Parameters:
-
indices
(List[int]
) βList[int]: A list of indices of the rows to remove. (Starting from 0)
Returns:
-
List[List[TableCell]]
βList[List[TableCell]]: A list representation of the removed rows as lists of TableCell objects.
TableCellLabel
Bases: str
, Enum
TableCellLabel.
Methods:
-
get_color
βReturn the RGB color associated with a given label.
Attributes:
-
BODY
β -
COLUMN_HEADER
β -
ROW_HEADER
β -
ROW_SECTION
β
BODY
BODY = 'body'
COLUMN_HEADER
COLUMN_HEADER = 'col_header'
ROW_HEADER
ROW_HEADER = 'row_header'
ROW_SECTION
ROW_SECTION = 'row_section'
get_color
get_color(label: TableCellLabel) -> Tuple[int, int, int]
Return the RGB color associated with a given label.
KeyValueItem
Bases: FloatingItem
KeyValueItem.
Methods:
-
caption_text
βComputes the caption as a single text.
-
export_to_document_tokens
βExport key value item to document tokens format.
-
get_annotations
βGet the annotations of this DocItem.
-
get_image
βReturns the image corresponding to this FloatingItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
captions
(List[RefItem]
) β -
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
footnotes
(List[RefItem]
) β -
graph
(GraphData
) β -
image
(Optional[ImageRef]
) β -
label
(Literal[KEY_VALUE_REGION]
) β -
model_config
β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
references
(List[RefItem]
) β -
self_ref
(str
) β
content_layer
content_layer: ContentLayer = BODY
graph
graph: GraphData
model_config
model_config = ConfigDict(extra='forbid')
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
export_to_document_tokens
export_to_document_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500, add_location: bool = True, add_content: bool = True)
Export key value item to document tokens format.
Parameters:
-
doc
(DoclingDocument
) β"DoclingDocument":
-
new_line
(str
, default:''
) βstr (Default value = "") Deprecated
-
xsize
(int
, default:500
) βint: (Default value = 500)
-
ysize
(int
, default:500
) βint: (Default value = 500)
-
add_location
(bool
, default:True
) βbool: (Default value = True)
-
add_content
(bool
, default:True
) βbool: (Default value = True)
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this DocItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image corresponding to this FloatingItem.
This function returns the PIL image from self.image if one is available. Otherwise, it uses DocItem.get_image to get an image of this FloatingItem.
In particular, when self.image is None, the function returns None if this FloatingItem has no valid provenance or the doc does not contain a valid image for the required page.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
SectionHeaderItem
Bases: TextItem
SectionItem.
Methods:
-
export_to_doctags
βExport text element to document tokens format.
-
export_to_document_tokens
βExport to DocTags format.
-
get_annotations
βGet the annotations of this DocItem.
-
get_image
βReturns the image of this DocItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
formatting
(Optional[Formatting]
) β -
hyperlink
(Optional[Union[AnyUrl, Path]]
) β -
label
(Literal[SECTION_HEADER]
) β -
level
(LevelNumber
) β -
model_config
β -
orig
(str
) β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
self_ref
(str
) β -
text
(str
) β
content_layer
content_layer: ContentLayer = BODY
formatting
formatting: Optional[Formatting] = None
hyperlink
hyperlink: Optional[Union[AnyUrl, Path]] = Field(union_mode='left_to_right', default=None)
level
level: LevelNumber = 1
model_config
model_config = ConfigDict(extra='forbid')
orig
orig: str
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
text
text: str
export_to_doctags
export_to_doctags(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500, add_location: bool = True, add_content: bool = True)
Export text element to document tokens format.
Parameters:
-
doc
(DoclingDocument
) β"DoclingDocument":
-
new_line
(str
, default:''
) βstr (Default value = "") Deprecated
-
xsize
(int
, default:500
) βint: (Default value = 500)
-
ysize
(int
, default:500
) βint: (Default value = 500)
-
add_location
(bool
, default:True
) βbool: (Default value = True)
-
add_content
(bool
, default:True
) βbool: (Default value = True)
export_to_document_tokens
export_to_document_tokens(*args, **kwargs)
Export to DocTags format.
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this DocItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image of this DocItem.
The function returns None if this DocItem has no valid provenance or if a valid image of the page containing this DocItem is not available in doc.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
PictureItem
Bases: FloatingItem
PictureItem.
Methods:
-
caption_text
βComputes the caption as a single text.
-
export_to_doctags
βExport picture to document tokens format.
-
export_to_document_tokens
βExport to DocTags format.
-
export_to_html
βExport picture to HTML format.
-
export_to_markdown
βExport picture to Markdown format.
-
get_annotations
βGet the annotations of this PictureItem.
-
get_image
βReturns the image corresponding to this FloatingItem.
-
get_location_tokens
βGet the location string for the BaseCell.
-
get_ref
βget_ref.
Attributes:
-
annotations
(List[PictureDataType]
) β -
captions
(List[RefItem]
) β -
children
(List[RefItem]
) β -
content_layer
(ContentLayer
) β -
footnotes
(List[RefItem]
) β -
image
(Optional[ImageRef]
) β -
label
(Literal[PICTURE, CHART]
) β -
model_config
β -
parent
(Optional[RefItem]
) β -
prov
(List[ProvenanceItem]
) β -
references
(List[RefItem]
) β -
self_ref
(str
) β
annotations
annotations: List[PictureDataType] = []
content_layer
content_layer: ContentLayer = BODY
model_config
model_config = ConfigDict(extra='forbid')
self_ref
self_ref: str = Field(pattern=_JSON_POINTER_REGEX)
export_to_doctags
export_to_doctags(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500, add_location: bool = True, add_caption: bool = True, add_content: bool = True)
Export picture to document tokens format.
Parameters:
-
doc
(DoclingDocument
) β"DoclingDocument":
-
new_line
(str
, default:''
) βstr (Default value = "") Deprecated
-
xsize
(int
, default:500
) βint: (Default value = 500)
-
ysize
(int
, default:500
) βint: (Default value = 500)
-
add_location
(bool
, default:True
) βbool: (Default value = True)
-
add_caption
(bool
, default:True
) βbool: (Default value = True)
-
add_content
(bool
, default:True
) βbool: (Default value = True)
export_to_document_tokens
export_to_document_tokens(*args, **kwargs)
Export to DocTags format.
export_to_html
export_to_html(doc: DoclingDocument, add_caption: bool = True, image_mode: ImageRefMode = PLACEHOLDER) -> str
Export picture to HTML format.
export_to_markdown
export_to_markdown(doc: DoclingDocument, add_caption: bool = True, image_mode: ImageRefMode = EMBEDDED, image_placeholder: str = '<!-- image -->') -> str
Export picture to Markdown format.
get_annotations
get_annotations() -> Sequence[BaseAnnotation]
Get the annotations of this PictureItem.
get_image
get_image(doc: DoclingDocument, prov_index: int = 0) -> Optional[Image]
Returns the image corresponding to this FloatingItem.
This function returns the PIL image from self.image if one is available. Otherwise, it uses DocItem.get_image to get an image of this FloatingItem.
In particular, when self.image is None, the function returns None if this FloatingItem has no valid provenance or the doc does not contain a valid image for the required page.
get_location_tokens
get_location_tokens(doc: DoclingDocument, new_line: str = '', xsize: int = 500, ysize: int = 500) -> str
Get the location string for the BaseCell.
ImageRef
Bases: BaseModel
ImageRef.
Methods:
-
from_pil
βConstruct ImageRef from a PIL Image.
-
validate_mimetype
βvalidate_mimetype.
Attributes:
-
dpi
(int
) β -
mimetype
(str
) β -
pil_image
(Optional[Image]
) βReturn the PIL Image.
-
size
(Size
) β -
uri
(Union[AnyUrl, Path]
) β
dpi
dpi: int
mimetype
mimetype: str
pil_image
pil_image: Optional[Image]
Return the PIL Image.
uri
uri: Union[AnyUrl, Path] = Field(union_mode='left_to_right')
from_pil
from_pil(image: Image, dpi: int) -> Self
Construct ImageRef from a PIL Image.
validate_mimetype
validate_mimetype(v)
validate_mimetype.
PictureClassificationClass
Bases: BaseModel
PictureClassificationData.
Attributes:
-
class_name
(str
) β -
confidence
(float
) β
class_name
class_name: str
confidence
confidence: float
PictureClassificationData
Bases: BaseAnnotation
PictureClassificationData.
Attributes:
-
kind
(Literal['classification']
) β -
predicted_classes
(List[PictureClassificationClass]
) β -
provenance
(str
) β
kind
kind: Literal['classification'] = 'classification'
provenance
provenance: str
RefItem
Bases: BaseModel
RefItem.
Methods:
Attributes:
-
cref
(str
) β -
model_config
β
cref
cref: str = Field(alias='$ref', pattern=_JSON_POINTER_REGEX)
model_config
model_config = ConfigDict(populate_by_name=True)
get_ref
get_ref()
get_ref.
BoundingBox
Bases: BaseModel
BoundingBox.
Methods:
-
area
βarea.
-
as_tuple
βas_tuple.
-
enclosing_bbox
βCreate a bounding box that covers all of the given boxes.
-
expand_by_scale
βexpand_to_size.
-
from_tuple
βfrom_tuple.
-
intersection_area_with
βCalculate the intersection area with another bounding box.
-
intersection_over_self
βintersection_over_self.
-
intersection_over_union
βintersection_over_union.
-
is_above
βis_above.
-
is_horizontally_connected
βis_horizontally_connected.
-
is_left_of
βis_left_of.
-
is_strictly_above
βis_strictly_above.
-
is_strictly_left_of
βis_strictly_left_of.
-
normalized
βnormalized.
-
overlaps
βoverlaps.
-
overlaps_horizontally
βCheck if two bounding boxes overlap horizontally.
-
overlaps_vertically
βCheck if two bounding boxes overlap vertically.
-
overlaps_vertically_with_iou
βoverlaps_y_with_iou.
-
resize_by_scale
βresize_by_scale.
-
scale_to_size
βscale_to_size.
-
scaled
βscaled.
-
to_bottom_left_origin
βto_bottom_left_origin.
-
to_top_left_origin
βto_top_left_origin.
-
union_area_with
βCalculates the union area with another bounding box.
-
x_overlap_with
βCalculates the horizontal overlap with another bounding box.
-
x_union_with
βCalculates the horizontal union dimension with another bounding box.
-
y_overlap_with
βCalculates the vertical overlap with another bounding box, respecting coordinate origin.
-
y_union_with
βCalculates the vertical union dimension with another bounding box, respecting coordinate origin.
Attributes:
-
b
(float
) β -
coord_origin
(CoordOrigin
) β -
height
βheight.
-
l
(float
) β -
r
(float
) β -
t
(float
) β -
width
βwidth.
b
b: float
height
height
height.
l
l: float
r
r: float
t
t: float
width
width
width.
area
area() -> float
area.
as_tuple
as_tuple() -> Tuple[float, float, float, float]
as_tuple.
enclosing_bbox
enclosing_bbox(boxes: List[BoundingBox]) -> BoundingBox
Create a bounding box that covers all of the given boxes.
from_tuple
from_tuple(coord: Tuple[float, ...], origin: CoordOrigin)
from_tuple.
Parameters:
-
coord
(Tuple[float, ...]
) βTuple[float:
-
...]
β -
origin
(CoordOrigin
) βCoordOrigin:
intersection_area_with
intersection_area_with(other: BoundingBox) -> float
Calculate the intersection area with another bounding box.
intersection_over_self
intersection_over_self(other: BoundingBox, eps: float = 1e-06) -> float
intersection_over_self.
intersection_over_union
intersection_over_union(other: BoundingBox, eps: float = 1e-06) -> float
intersection_over_union.
is_horizontally_connected
is_horizontally_connected(elem_i: BoundingBox, elem_j: BoundingBox) -> bool
is_horizontally_connected.
is_strictly_above
is_strictly_above(other: BoundingBox, eps: float = 0.001) -> bool
is_strictly_above.
is_strictly_left_of
is_strictly_left_of(other: BoundingBox, eps: float = 0.001) -> bool
is_strictly_left_of.
overlaps_horizontally
overlaps_horizontally(other: BoundingBox) -> bool
Check if two bounding boxes overlap horizontally.
overlaps_vertically
overlaps_vertically(other: BoundingBox) -> bool
Check if two bounding boxes overlap vertically.
overlaps_vertically_with_iou
overlaps_vertically_with_iou(other: BoundingBox, iou: float) -> bool
overlaps_y_with_iou.
resize_by_scale
resize_by_scale(x_scale: float, y_scale: float)
resize_by_scale.
scaled
scaled(scale: float)
scaled.
to_bottom_left_origin
to_bottom_left_origin(page_height: float) -> BoundingBox
to_bottom_left_origin.
Parameters:
-
page_height
(float
) β
to_top_left_origin
to_top_left_origin(page_height: float) -> BoundingBox
to_top_left_origin.
Parameters:
-
page_height
(float
) β
union_area_with
union_area_with(other: BoundingBox) -> float
Calculates the union area with another bounding box.
x_overlap_with
x_overlap_with(other: BoundingBox) -> float
Calculates the horizontal overlap with another bounding box.
x_union_with
x_union_with(other: BoundingBox) -> float
Calculates the horizontal union dimension with another bounding box.
y_overlap_with
y_overlap_with(other: BoundingBox) -> float
Calculates the vertical overlap with another bounding box, respecting coordinate origin.
y_union_with
y_union_with(other: BoundingBox) -> float
Calculates the vertical union dimension with another bounding box, respecting coordinate origin.
CoordOrigin
Bases: str
, Enum
CoordOrigin.
Attributes:
-
BOTTOMLEFT
β -
TOPLEFT
β
BOTTOMLEFT
BOTTOMLEFT = 'BOTTOMLEFT'
TOPLEFT
TOPLEFT = 'TOPLEFT'
ImageRefMode
Bases: str
, Enum
ImageRefMode.
Attributes:
-
EMBEDDED
β -
PLACEHOLDER
β -
REFERENCED
β
EMBEDDED
EMBEDDED = 'embedded'
PLACEHOLDER
PLACEHOLDER = 'placeholder'
REFERENCED
REFERENCED = 'referenced'