mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-30 14:04:27 +00:00
update github pages
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
parent
7e8288ca01
commit
8ef990bad3
18
README.md
18
README.md
@ -11,7 +11,7 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
[](https://arxiv.org/abs/2408.09869)
|
[](https://arxiv.org/abs/2408.09869)
|
||||||
[](https://ds4sd.github.io/docling/)
|
[](https://docling-project.github.io/docling/)
|
||||||
[](https://pypi.org/project/docling/)
|
[](https://pypi.org/project/docling/)
|
||||||
[](https://pypi.org/project/docling/)
|
[](https://pypi.org/project/docling/)
|
||||||
[](https://python-poetry.org/)
|
[](https://python-poetry.org/)
|
||||||
@ -51,7 +51,7 @@ pip install docling
|
|||||||
|
|
||||||
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
|
Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.
|
||||||
|
|
||||||
More [detailed installation instructions](https://ds4sd.github.io/docling/installation/) are available in the docs.
|
More [detailed installation instructions](https://docling-project.github.io/docling/installation/) are available in the docs.
|
||||||
|
|
||||||
## Getting started
|
## Getting started
|
||||||
|
|
||||||
@ -66,23 +66,23 @@ result = converter.convert(source)
|
|||||||
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
|
print(result.document.export_to_markdown()) # output: "## Docling Technical Report[...]"
|
||||||
```
|
```
|
||||||
|
|
||||||
More [advanced usage options](https://ds4sd.github.io/docling/usage/) are available in
|
More [advanced usage options](https://docling-project.github.io/docling/usage/) are available in
|
||||||
the docs.
|
the docs.
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
Check out Docling's [documentation](https://ds4sd.github.io/docling/), for details on
|
Check out Docling's [documentation](https://docling-project.github.io/docling/), for details on
|
||||||
installation, usage, concepts, recipes, extensions, and more.
|
installation, usage, concepts, recipes, extensions, and more.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Go hands-on with our [examples](https://ds4sd.github.io/docling/examples/),
|
Go hands-on with our [examples](https://docling-project.github.io/docling/examples/),
|
||||||
demonstrating how to address different application use cases with Docling.
|
demonstrating how to address different application use cases with Docling.
|
||||||
|
|
||||||
## Integrations
|
## Integrations
|
||||||
|
|
||||||
To further accelerate your AI application development, check out Docling's native
|
To further accelerate your AI application development, check out Docling's native
|
||||||
[integrations](https://ds4sd.github.io/docling/integrations/) with popular frameworks
|
[integrations](https://docling-project.github.io/docling/integrations/) with popular frameworks
|
||||||
and tools.
|
and tools.
|
||||||
|
|
||||||
## Get help and support
|
## Get help and support
|
||||||
@ -123,6 +123,6 @@ For individual model usage, please refer to the model licenses found in the orig
|
|||||||
|
|
||||||
Docling has been brought to you by IBM.
|
Docling has been brought to you by IBM.
|
||||||
|
|
||||||
[supported_formats]: https://ds4sd.github.io/docling/usage/supported_formats/
|
[supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
|
||||||
[docling_document]: https://ds4sd.github.io/docling/concepts/docling_document/
|
[docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
|
||||||
[integrations]: https://ds4sd.github.io/docling/integrations/
|
[integrations]: https://docling-project.github.io/docling/integrations/
|
||||||
|
@ -121,7 +121,7 @@ def download(
|
|||||||
"Using the CLI:",
|
"Using the CLI:",
|
||||||
f"`docling --artifacts-path={output_dir} FILE`",
|
f"`docling --artifacts-path={output_dir} FILE`",
|
||||||
"\n",
|
"\n",
|
||||||
"Using Python: see the documentation at <https://ds4sd.github.io/docling/usage>.",
|
"Using Python: see the documentation at <https://docling-project.github.io/docling/usage>.",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@ -26,7 +26,7 @@ class OcrMacModel(BaseOcrModel):
|
|||||||
"ocrmac is not correctly installed. "
|
"ocrmac is not correctly installed. "
|
||||||
"Please install it via `pip install ocrmac` to use this OCR engine. "
|
"Please install it via `pip install ocrmac` to use this OCR engine. "
|
||||||
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
||||||
"https://ds4sd.github.io/docling/installation/"
|
"https://docling-project.github.io/docling/installation/"
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
from ocrmac import ocrmac
|
from ocrmac import ocrmac
|
||||||
|
@ -31,14 +31,14 @@ class TesseractOcrModel(BaseOcrModel):
|
|||||||
"Note that tesserocr might have to be manually compiled for working with "
|
"Note that tesserocr might have to be manually compiled for working with "
|
||||||
"your Tesseract installation. The Docling documentation provides examples for it. "
|
"your Tesseract installation. The Docling documentation provides examples for it. "
|
||||||
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
"Alternatively, Docling has support for other OCR engines. See the documentation: "
|
||||||
"https://ds4sd.github.io/docling/installation/"
|
"https://docling-project.github.io/docling/installation/"
|
||||||
)
|
)
|
||||||
missing_langs_errmsg = (
|
missing_langs_errmsg = (
|
||||||
"tesserocr is not correctly configured. No language models have been detected. "
|
"tesserocr is not correctly configured. No language models have been detected. "
|
||||||
"Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. "
|
"Please ensure that the TESSDATA_PREFIX envvar points to tesseract languages dir. "
|
||||||
"You can find more information how to setup other OCR engines in Docling "
|
"You can find more information how to setup other OCR engines in Docling "
|
||||||
"documentation: "
|
"documentation: "
|
||||||
"https://ds4sd.github.io/docling/installation/"
|
"https://docling-project.github.io/docling/installation/"
|
||||||
)
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
@ -36,7 +36,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This is an example of using [Docling](https://ds4sd.github.io/docling/) for converting structured data (XML) into a unified document\n",
|
"This is an example of using [Docling](https://docling-project.github.io/docling/) for converting structured data (XML) into a unified document\n",
|
||||||
"representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n",
|
"representation format, `DoclingDocument`, and leverage its riched structured content for RAG applications.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n",
|
"Data used in this example consist of patents from the [United States Patent and Trademark Office (USPTO)](https://www.uspto.gov/) and medical\n",
|
||||||
|
@ -103,7 +103,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://ds4sd.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
|
"> 👉 **NOTE**: As you see above, using the `HybridChunker` can sometimes lead to a warning from the transformers library, however this is a \"false alarm\" — for details check [here](https://docling-project.github.io/docling/faq/#hybridchunker-triggers-warning-token-indices-sequence-length-is-longer-than-the-specified-maximum-sequence-length-for-this-model)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -36,7 +36,7 @@
|
|||||||
"## A recipe 🧑🍳 🐥 💚\n",
|
"## A recipe 🧑🍳 🐥 💚\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
|
"This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
|
||||||
"- [Docling](https://ds4sd.github.io/docling/) for document parsing and chunking\n",
|
"- [Docling](https://docling-project.github.io/docling/) for document parsing and chunking\n",
|
||||||
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
|
"- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
|
||||||
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
|
"- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
@ -29,7 +29,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"## A recipe 🧑🍳 🐥 💚\n",
|
"## A recipe 🧑🍳 🐥 💚\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://ds4sd.github.io/docling/).\n",
|
"This is a code recipe that uses [Weaviate](https://weaviate.io/) to perform RAG over PDF documents parsed by [Docling](https://docling-project.github.io/docling/).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook, we accomplish the following:\n",
|
"In this notebook, we accomplish the following:\n",
|
||||||
"* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",
|
"* Parse the top machine learning papers on [arXiv](https://arxiv.org/) using Docling\n",
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
site_name: Docling
|
site_name: Docling
|
||||||
site_url: https://ds4sd.github.io/docling/
|
site_url: https://docling-project.github.io/docling/
|
||||||
repo_name: docling-project/docling
|
repo_name: docling-project/docling
|
||||||
repo_url: https://github.com/docling-project/docling
|
repo_url: https://github.com/docling-project/docling
|
||||||
|
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
<html lang="en">
|
<html lang="en">
|
||||||
<head>
|
<head>
|
||||||
<link rel="icon" type="image/png"
|
<link rel="icon" type="image/png"
|
||||||
href="https://ds4sd.github.io/docling/assets/logo.png"/>
|
href="https://docling-project.github.io/docling/assets/logo.png"/>
|
||||||
<meta charset="UTF-8">
|
<meta charset="UTF-8">
|
||||||
<title>
|
<title>
|
||||||
Powered by Docling
|
Powered by Docling
|
||||||
|
Loading…
Reference in New Issue
Block a user