<a href="https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/hybrid_rag_qdrant
.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval with Qdrant

| Step | Tech | Execution | 
| --- | --- | --- |
| Embedding | FastEmbed | üíª Local |
| Vector store | Qdrant | üíª Local |

## Overview

This example demonstrates using Docling with [Qdrant](https://qdrant.tech/) to perform a hybrid search across your documents using dense and sparse vectors.

We'll chunk the documents using Docling before adding them to a Qdrant collection. By limiting the length of the chunks, we can preserve the meaning in each vector embedding.

## Setup

- üëâ Qdrant client uses [FastEmbed](https://github.com/qdrant/fastembed) to generate vector embeddings. You can install the `fastembed-gpu` package if you've got the hardware to support it.

In [1]:
%pip install --no-warn-conflicts -q qdrant-client docling fastembed

Note: you may need to restart the kernel to use updated packages.


Let's import all the classes we'll be working with.

In [2]:
from qdrant_client import QdrantClient

from docling.chunking import HybridChunker
from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter

- For Docling, we'll set the  allowed formats to HTML since we'll only be working with webpages in this tutorial.
- If we set a sparse model, Qdrant client will fuse the dense and sparse results using RRF. [Reference](https://qdrant.tech/documentation/tutorials/hybrid-search-fastembed/).

In [3]:
COLLECTION_NAME = "docling"

doc_converter = DocumentConverter(allowed_formats=[InputFormat.HTML])
client = QdrantClient(location=":memory:")
# The :memory: mode is a Python imitation of Qdrant's APIs for prototyping and CI.
# For production deployments, use the Docker image: docker run -p 6333:6333 qdrant/qdrant
# client = QdrantClient(location="http://localhost:6333")

client.set_model("sentence-transformers/all-MiniLM-L6-v2")
client.set_sparse_model("Qdrant/bm25")



We can now download and chunk the document using Docling. For demonstration, we'll use an article about chunking strategies :)

In [4]:
result = doc_converter.convert(
    "https://www.sagacify.com/news/a-guide-to-chunking-strategies-for-retrieval-augmented-generation-rag"
)
documents, metadatas = [], []
for chunk in HybridChunker().chunk(result.document):
    documents.append(chunk.text)
    metadatas.append(chunk.meta.export_json_dict())

Let's now upload the documents to Qdrant.

- The `add()` method batches the documents and uses FastEmbed to generate vector embeddings on our machine.

In [5]:
_ = client.add(
    collection_name=COLLECTION_NAME,
    documents=documents,
    metadata=metadatas,
    batch_size=64,
)

## Retrieval

In [6]:
points = client.query(
    collection_name=COLLECTION_NAME,
    query_text="Can I split documents?",
    limit=10,
)

In [7]:
for i, point in enumerate(points):
    print(f"=== {i} ===")
    print(point.document)
    print()

=== 0 ===
Have you ever wondered how we, humans, would chunk? Here's a breakdown of a possible way a human would process a new document:
1. We start at the top of the document, treating the first part as a chunk.
¬†¬†¬†2. We continue down the document, deciding if a new sentence or piece of information belongs with the first chunk or should start a new one.
 ¬†¬†¬†3. We keep this up until we reach the end of the document.
The ultimate dream? Having an agent do this for you. But slow down! This approach is still being tested and isn't quite ready for the big leagues due to the time it takes to process multiple LLM calls and the cost of those calls. There's no implementation available in public libraries just yet. However, Greg Kamradt has his version available here.

=== 1 ===
Document Specific Chunking is a strategy that respects the document's structure. Rather than using a set number of characters or a recursive process, it creates chunks that align with the logical sections of the d