mirror of
https://github.com/DS4SD/docling.git
synced 2025-12-08 12:48:28 +00:00
* chore(html): refactor parser to leverage context managers Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * fix(html): parse inline code snippets, also from list items Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> * chore(html): remove hidden tags Remove tags that are not meant to be displayed. Add regression tests for code blocks, inline code, and hidden tags. Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com> --------- Signed-off-by: Cesar Berrospi Ramis <ceb@zurich.ibm.com>
1.0 KiB
Vendored
1.0 KiB
Vendored
Code snippets
The Pythagorean theorem can be written as an equation relating the lengths of the sides a , b and the hypotenuse c .
To use Docling, simply install docling from your package manager, e.g. pip: pip install docling
To convert individual documents with python, use convert() , for example:
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
The program will output: ## Docling Technical Report[...]
Prefetch the models:
- Use the
docling-tools models downloadutility: - Alternatively, models can be programmatically downloaded using
docling.utils.model_downloader.download_models(). - Also, you can use download-hf-repo parameter to download arbitrary models from HuggingFace by specifying repo id:
$ docling-tools models download-hf-repo ds4sd/SmolDocling-256M-preview Downloading ds4sd/SmolDocling-256M-preview model from HuggingFace...