mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
feat(html): Support formatting tags in HTML texts (#2111)
* add parsing for formatting tags in HTML backend Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch> fix latest tests + wiki_duck result files. Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch> * convert _collect_parent_format_tags to staticmethod Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch> --------- Signed-off-by: Roman Kayan BAZG <roman.kayan@bazg.admin.ch>
This commit is contained in:
@@ -27,6 +27,6 @@ HTML
|
||||
|
||||
Docling has three backends for parsing HTML files:
|
||||
|
||||
1. HTMLDocumentBackend Ignores images
|
||||
2. HTMLDocumentBackendImagesInline Extracts images inline
|
||||
3. HTMLDocumentBackendImagesReferenced Extracts images as references
|
||||
1. **HTMLDocumentBackend** Ignores images
|
||||
2. **HTMLDocumentBackendImagesInline** Extracts images inline
|
||||
3. **HTMLDocumentBackendImagesReferenced** Extracts images as references
|
||||
Reference in New Issue
Block a user