mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-26 12:04:31 +00:00
update README
Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
parent
b9fd50e7de
commit
33d5d7d787
26
README.md
26
README.md
@ -30,19 +30,35 @@ To use Docling, simply install `docling` from your package manager, e.g. pip:
|
|||||||
pip install docling
|
pip install docling
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> Works on macOS and Linux environments. Windows platforms are currently not tested.
|
> Works on macOS and Linux environments. Windows platforms are currently not tested.
|
||||||
|
|
||||||
### Development setup
|
### Development setup
|
||||||
|
|
||||||
To develop for Docling, you need Python 3.10 / 3.11 / 3.12 and Poetry. You can then install from your local clone's root dir:
|
To develop for Docling, you need Python 3.10 / 3.11 / 3.12 and Poetry. You can then install from your local clone's root dir:
|
||||||
```bash
|
```bash
|
||||||
poetry install
|
poetry install --all-extras
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
For basic usage, see the [convert.py](https://github.com/DS4SD/docling/blob/main/examples/convert.py) example module. Run with:
|
### Convert a single document
|
||||||
|
|
||||||
|
To convert invidual PDF documents, use `convert_single()`, for example:
|
||||||
|
```python
|
||||||
|
from docling.document_converter import DocumentConverter
|
||||||
|
|
||||||
|
source = "https://arxiv.org/pdf/2206.01062" # PDF path or URL
|
||||||
|
converter = DocumentConverter()
|
||||||
|
doc = converter.convert_single(source)
|
||||||
|
print(doc.export_to_markdown()) # output: "## DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis [...]"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Convert a batch of documents
|
||||||
|
|
||||||
|
For an example of converting multiple documents, see [convert.py](https://github.com/DS4SD/docling/blob/main/examples/convert.py).
|
||||||
|
|
||||||
|
From a local repo clone, you can run it with:
|
||||||
|
|
||||||
```
|
```
|
||||||
python examples/convert.py
|
python examples/convert.py
|
||||||
@ -58,7 +74,7 @@ You can control if table structure recognition or OCR should be performed by arg
|
|||||||
doc_converter = DocumentConverter(
|
doc_converter = DocumentConverter(
|
||||||
artifacts_path=artifacts_path,
|
artifacts_path=artifacts_path,
|
||||||
pipeline_options=PipelineOptions(
|
pipeline_options=PipelineOptions(
|
||||||
do_table_structure=False, # controls if table structure is recovered
|
do_table_structure=False, # controls if table structure is recovered
|
||||||
do_ocr=True, # controls if OCR is applied (ignores programmatic content)
|
do_ocr=True, # controls if OCR is applied (ignores programmatic content)
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
@ -90,7 +106,7 @@ conv_input = DocumentConversionInput.from_paths(
|
|||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Convert from binary PDF streams
|
### Convert from binary PDF streams
|
||||||
|
|
||||||
You can convert PDFs from a binary stream instead of from the filesystem as follows:
|
You can convert PDFs from a binary stream instead of from the filesystem as follows:
|
||||||
```python
|
```python
|
||||||
|
Loading…
Reference in New Issue
Block a user