mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 20:58:11 +00:00
fix(html): preserve code blocks in list items (#2131)
* chore(html): refactor parser to leverage context managers Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * fix(html): parse inline code snippets, also from list items Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * chore(html): remove hidden tags Remove tags that are not meant to be displayed. Add regression tests for code blocks, inline code, and hidden tags. Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
c0268416cf
commit
fa3327e1a6
24
tests/data/groundtruth/docling_v2/html_code_snippets.html.md
vendored
Normal file
24
tests/data/groundtruth/docling_v2/html_code_snippets.html.md
vendored
Normal file
@@ -0,0 +1,24 @@
|
||||
# Code snippets
|
||||
|
||||
The Pythagorean theorem can be written as an equation relating the lengths of the sides *a* , *b* and the hypotenuse *c* .
|
||||
|
||||
To use Docling, simply install `docling` from your package manager, e.g. pip: `pip install docling`
|
||||
|
||||
To convert individual documents with python, use `convert()` , for example:
|
||||
|
||||
```
|
||||
from docling.document_converter import DocumentConverter
|
||||
|
||||
source = "https://arxiv.org/pdf/2408.09869"
|
||||
converter = DocumentConverter()
|
||||
result = converter.convert(source)
|
||||
print(result.document.export_to_markdown())
|
||||
```
|
||||
|
||||
The program will output: `## Docling Technical Report[...]`
|
||||
|
||||
Prefetch the models:
|
||||
|
||||
- Use the `docling-tools models download` utility:
|
||||
- Alternatively, models can be programmatically downloaded using `docling.utils.model_downloader.download_models()` .
|
||||
- Also, you can use download-hf-repo parameter to download arbitrary models from HuggingFace by specifying repo id: `$ docling-tools models download-hf-repo ds4sd/SmolDocling-256M-preview Downloading ds4sd/SmolDocling-256M-preview model from HuggingFace...`
|
||||
Reference in New Issue
Block a user