<a href="https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/hybrid_rag_qdrant
.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hybrid RAG with Qdrant

## Overview

This example demonstrates using Docling with [Qdrant](https://qdrant.tech/) to perform a hybrid search across your documents using dense and sparse vectors.

We'll chunk the documents using Docling before adding them to a Qdrant collection. By limiting the length of the chunks, we can preserve the meaning in each vector embedding.

## Setup

- ðŸ‘‰ Qdrant client uses [FastEmbed](https://github.com/qdrant/fastembed) to generate vector embeddings. You can install the `fastembed-gpu` package if you've got the hardware to support it.

In [None]:
%pip install --no-warn-conflicts -q qdrant-client docling docling-core fastembed


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Let's import all the classes we'll be working with.

In [1]:
from docling_core.transforms.chunker import HierarchicalChunker
from qdrant_client import QdrantClient

from docling.datamodel.base_models import InputFormat
from docling.document_converter import DocumentConverter

- For Docling, we'll set the  allowed formats to HTML since we'll only be working with webpages in this tutorial.
- If we set a sparse model, Qdrant client will fuse the dense and sparse results using RRF. [Reference](https://qdrant.tech/documentation/tutorials/hybrid-search-fastembed/).

In [2]:
COLLECTION_NAME = "docling"

doc_converter = DocumentConverter(allowed_formats=[InputFormat.HTML])
client = QdrantClient(location=":memory:")
# The :memory: mode is a Python imitation of Qdrant's APIs for prototyping and CI.
# For production deployments, use the Docker image: docker run -p 6333:6333 qdrant/qdrant
# client = QdrantClient(location="http://localhost:6333")

client.set_model("sentence-transformers/all-MiniLM-L6-v2")
client.set_sparse_model("Qdrant/bm25")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Fetching 29 files:   0%|          | 0/29 [00:00<?, ?it/s]

We can now download and chunk the document using Docling. For demonstration, we'll use an article about chunking strategies :)

In [3]:
result = doc_converter.convert(
    "https://www.sagacify.com/news/a-guide-to-chunking-strategies-for-retrieval-augmented-generation-rag"
)
documents, metadatas = [], []
for chunk in HierarchicalChunker().chunk(result.document):
    documents.append(chunk.text)
    metadatas.append(chunk.meta.export_json_dict())

Let's now upload the documents to Qdrant.

- The `add()` method batches the documents and uses FastEmbed to generate vector embeddings on our machine.

In [4]:
client.add(COLLECTION_NAME, documents=documents, metadata=metadatas, batch_size=64)

['e74ae15be5eb4805858307846318e784',
 'f83f6125b0fa4a0595ae6a0777c9d90d',
 '9cf63c7f30764715bf3804a19db36d7d',
 '007dbe6d355b4b49af3b736cbd63a4d8',
 'e5e31f21f2e84aa68beca0dfc532cbe9',
 '69c10816af204bb28630a1f957d8dd3e',
 'b63546b9b1744063bdb076b234d883ca',
 '90ad15ba8fa6494489e1d3221e30bfcf',
 '13517debb483452ea40fc7aa04c08c50',
 '84ccab5cfab74e27a55acef1c63e3fad',
 'e8aa2ef46d234c5a8a9da64b701d60b4',
 '190bea5ba43c45e792197c50898d1d90',
 'a730319ea65645ca81e735ace0bcc72e',
 '415e7f6f15864e30b836e23ae8d71b43',
 '5569bce4e65541868c762d149c6f491e',
 '74d9b234e9c04ebeb8e4e1ca625789ac',
 '308b1c5006a94a679f4c8d6f2396993c',
 'aaa5ec6d385a418388e660c425bf1dbe',
 '630be8e43e4e4472a9cdb9af9462a43a',
 '643b316224de4770a5349bf69cf93471',
 'da9265e6f6c2485493d15223eefdf411',
 'a916e447d52c4084b5ce81a0c5a65b07',
 '2883c620858e4e728b88e127155a4f2c',
 '2a998f0e9c124af99027060b94027874',
 'be551fbd2b9e42f48ebae0cbf1f481bc',
 '95b7f7608e974ca6847097ee4590fba1',
 '309db4f3863b4e3aaf16d5f346c309f3',
 

## Query Documents

In [5]:
points = client.query(COLLECTION_NAME, query_text="Can I split documents?", limit=10)

print("<=== Retrieved documents ===>")
for point in points:
    print(point.document)

<=== Retrieved documents ===>
Document Specific Chunking is a strategy that respects the document's structure. Rather than using a set number of characters or a recursive process, it creates chunks that align with the logical sections of the document, like paragraphs or subsections. This approach maintains the original author's organization of content and helps keep the text coherent. It makes the retrieved information more relevant and useful, particularly for structured documents with clearly defined sections.
Document Specific Chunking can handle a variety of document formats, such as:
Consequently, there are also splitters available for this purpose.
1. We start at the top of the document, treating the first part as a chunk.
Â Â Â 2. We continue down the document, deciding if a new sentence or piece of information belongs with the first chunk or should start a new one.
 Â Â Â 3. We keep this up until we reach the end of the document.
Have you ever wondered how we, humans, would chu