diff --git a/README.md b/README.md
index ebc5aeb7..176570d1 100644
--- a/README.md
+++ b/README.md
@@ -29,17 +29,20 @@ Docling simplifies document processing, parsing diverse formats — including ad
 
 ## Features
 
-* 🗂️  Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, images (PNG, TIFF, JPEG, ...), and more
+* 🗂️ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, images (PNG, TIFF, JPEG, ...), and more
 * 📑 Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
 * 🧬 Unified, expressive [DoclingDocument][docling_document] representation format
-* ↪️  Various [export formats][supported_formats] and options, including Markdown, HTML, [DocTags](https://arxiv.org/abs/2503.11576) and lossless JSON
+* ↪️ Various [export formats][supported_formats] and options, including Markdown, HTML, [DocTags](https://arxiv.org/abs/2503.11576) and lossless JSON
 * 🔒 Local execution capabilities for sensitive data and air-gapped environments
 * 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
 * 🔍 Extensive OCR support for scanned PDFs and images
 * 👓 Support of several Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
-* 🎙️  Support for Audio with Automatic Speech Recognition (ASR) models
+* 🎙️ Audio support with Automatic Speech Recognition (ASR) models
 * 💻 Simple and convenient CLI
 
+### What's new
+* 📤 Structured [information extraction][extraction] \[🧪 beta\]
+
 ### Coming soon
 
 * 📝 Metadata extraction, including title, authors, references & language
@@ -150,3 +153,4 @@ The project was started by the AI for knowledge team at IBM Research Zurich.
 [supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
 [docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
 [integrations]: https://docling-project.github.io/docling/integrations/
+[extraction]: https://docling-project.github.io/docling/examples/extraction/
diff --git a/docs/examples/dpk-ingest-chunck-tokenize.ipynb b/docs/examples/dpk-ingest-chunk-tokenize.ipynb
similarity index 99%
rename from docs/examples/dpk-ingest-chunck-tokenize.ipynb
rename to docs/examples/dpk-ingest-chunk-tokenize.ipynb
index a25b1e7d..f44cdc2f 100644
--- a/docs/examples/dpk-ingest-chunck-tokenize.ipynb
+++ b/docs/examples/dpk-ingest-chunk-tokenize.ipynb
@@ -5,7 +5,7 @@
    "id": "3f312845",
    "metadata": {},
    "source": [
-    "# 🛡️ Chunking and tokenizing HTML documents using Data Prep Kit and the Docling Transforms\n",
+    "# Chunking & tokenization with Data Prep Kit\n",
     "\n",
     "This notebook demonstrates how to build a sequence of <a href=https://github.com/data-prep-kit/data-prep-kit> <b>DPK transforms</b> </a> for ingesting HTML documents using Docling2Parquet transforms and chunking them using Doc_Chunk transform. Both transforms are based on the <a href=https://docling-project.github.io/docling/> Docling library</a>. \n",
     "\n",
diff --git a/docs/examples/extraction.ipynb b/docs/examples/extraction.ipynb
new file mode 100644
index 00000000..850c6feb
--- /dev/null
+++ b/docs/examples/extraction.ipynb
@@ -0,0 +1,675 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "15674164",
+   "metadata": {},
+   "source": [
+    "# Information extraction"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d796485",
+   "metadata": {},
+   "source": [
+    "> 👉 **NOTE**: The extraction API is currently <i>in beta</i> and may change without prior notice."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "932f12cd",
+   "metadata": {},
+   "source": [
+    "Docling provides the capability of extracting information, i.e. structured data, from unstructured documents.\n",
+    "\n",
+    "The user can provide the desired data schema AKA *template*, either as a dictionary or as a Pydantic model, and Docling will return\n",
+    "the extracted data as a standardized output, organized by page.\n",
+    "\n",
+    "Check out the subsections below for different usage scenarios."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "f97abf2e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython import display\n",
+    "from pydantic import BaseModel, Field\n",
+    "from rich import print"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cda07006",
+   "metadata": {},
+   "source": [
+    "In this notebook, we will work with an example input image — let's quickly inspect it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "15846b44",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<img src='https://upload.wikimedia.org/wikipedia/commons/9/9f/Swiss_QR-Bill_example.jpg' height='1000'>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "file_path = (\n",
+    "    \"https://upload.wikimedia.org/wikipedia/commons/9/9f/Swiss_QR-Bill_example.jpg\"\n",
+    ")\n",
+    "display.HTML(f\"<img src='{file_path}' height='1000'>\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbbc173c",
+   "metadata": {},
+   "source": [
+    "## Defining the extractor"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9871c8d",
+   "metadata": {},
+   "source": [
+    "Let's first define our extractor:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "7a8c6ff0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from docling.datamodel.base_models import InputFormat\n",
+    "from docling.document_extractor import DocumentExtractor\n",
+    "\n",
+    "extractor = DocumentExtractor(allowed_formats=[InputFormat.IMAGE, InputFormat.PDF])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2b1933e",
+   "metadata": {},
+   "source": [
+    "Following, we look at different ways to define the data template."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20e62dfd",
+   "metadata": {},
+   "source": [
+    "## Using a string template"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "4c5119b0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/pva/work/github.com/DS4SD/docling/docling/document_extractor.py:143: UserWarning: The extract API is currently experimental and may change without prior notice.\n",
+      "Only PDF and image formats are supported.\n",
+      "  return next(all_res)\n",
+      "You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.\n",
+      "The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
+       "    <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ExtractedPageData</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">page_no</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">extracted_data</span>=<span style=\"font-weight: bold\">{</span><span style=\"color: #008000; text-decoration-color: #008000\">'bill_no'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'3139'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'total'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3949.75</span><span style=\"font-weight: bold\">}</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">raw_text</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'{\"bill_no\": \"3139\", \"total\": 3949.75}'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">errors</span>=<span style=\"font-weight: bold\">[]</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>\n",
+       "<span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\n",
+       "    \u001b[1;35mExtractedPageData\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mpage_no\u001b[0m=\u001b[1;36m1\u001b[0m,\n",
+       "        \u001b[33mextracted_data\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'bill_no'\u001b[0m: \u001b[32m'3139'\u001b[0m, \u001b[32m'total'\u001b[0m: \u001b[1;36m3949.75\u001b[0m\u001b[1m}\u001b[0m,\n",
+       "        \u001b[33mraw_text\u001b[0m=\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"bill_no\": \"3139\", \"total\": 3949.75\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m,\n",
+       "        \u001b[33merrors\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m\n",
+       "\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "result = extractor.extract(\n",
+    "    source=file_path,\n",
+    "    template='{\"bill_no\": \"string\", \"total\": \"float\"}',\n",
+    ")\n",
+    "print(result.pages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0da85c9c",
+   "metadata": {},
+   "source": [
+    "## Using a dict template"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e0df82f6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
+       "    <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ExtractedPageData</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">page_no</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">extracted_data</span>=<span style=\"font-weight: bold\">{</span><span style=\"color: #008000; text-decoration-color: #008000\">'bill_no'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'3139'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'total'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3949.75</span><span style=\"font-weight: bold\">}</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">raw_text</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'{\"bill_no\": \"3139\", \"total\": 3949.75}'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">errors</span>=<span style=\"font-weight: bold\">[]</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>\n",
+       "<span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\n",
+       "    \u001b[1;35mExtractedPageData\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mpage_no\u001b[0m=\u001b[1;36m1\u001b[0m,\n",
+       "        \u001b[33mextracted_data\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'bill_no'\u001b[0m: \u001b[32m'3139'\u001b[0m, \u001b[32m'total'\u001b[0m: \u001b[1;36m3949.75\u001b[0m\u001b[1m}\u001b[0m,\n",
+       "        \u001b[33mraw_text\u001b[0m=\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"bill_no\": \"3139\", \"total\": 3949.75\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m,\n",
+       "        \u001b[33merrors\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m\n",
+       "\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "result = extractor.extract(\n",
+    "    source=file_path,\n",
+    "    template={\n",
+    "        \"bill_no\": \"string\",\n",
+    "        \"total\": \"float\",\n",
+    "    },\n",
+    ")\n",
+    "print(result.pages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "925c1804",
+   "metadata": {},
+   "source": [
+    "## Using a Pydantic model template"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01aee19d",
+   "metadata": {},
+   "source": [
+    "First we define the Pydantic model we want to use"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "69facb7b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import Optional\n",
+    "\n",
+    "\n",
+    "class Invoice(BaseModel):\n",
+    "    bill_no: str = Field(\n",
+    "        examples=[\"A123\", \"5414\"]\n",
+    "    )  # provide some examples, but no default value\n",
+    "    total: float = Field(\n",
+    "        default=10, examples=[20]\n",
+    "    )  # provide some examples and a default value\n",
+    "    tax_id: Optional[str] = Field(default=None, examples=[\"1234567890\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbcbce95",
+   "metadata": {},
+   "source": [
+    "The class itself can then be used directly as the template: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "81db63b1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
+       "    <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ExtractedPageData</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">page_no</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">extracted_data</span>=<span style=\"font-weight: bold\">{</span><span style=\"color: #008000; text-decoration-color: #008000\">'bill_no'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'3139'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'total'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3949.75</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'tax_id'</span>: <span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span><span style=\"font-weight: bold\">}</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">raw_text</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'{\"bill_no\": \"3139\", \"total\": 3949.75, \"tax_id\": null}'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">errors</span>=<span style=\"font-weight: bold\">[]</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>\n",
+       "<span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\n",
+       "    \u001b[1;35mExtractedPageData\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mpage_no\u001b[0m=\u001b[1;36m1\u001b[0m,\n",
+       "        \u001b[33mextracted_data\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'bill_no'\u001b[0m: \u001b[32m'3139'\u001b[0m, \u001b[32m'total'\u001b[0m: \u001b[1;36m3949.75\u001b[0m, \u001b[32m'tax_id'\u001b[0m: \u001b[3;35mNone\u001b[0m\u001b[1m}\u001b[0m,\n",
+       "        \u001b[33mraw_text\u001b[0m=\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"bill_no\": \"3139\", \"total\": 3949.75, \"tax_id\": null\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m,\n",
+       "        \u001b[33merrors\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m\n",
+       "\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "result = extractor.extract(\n",
+    "    source=file_path,\n",
+    "    template=Invoice,\n",
+    ")\n",
+    "print(result.pages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bd8736b",
+   "metadata": {},
+   "source": [
+    "Alternatively, a Pydantic model instance can be passed as a template instead, allowing to override the default values.\n",
+    "\n",
+    "This can be very useful in scenarios where we happen to have available context that is more relevant than the\n",
+    "default values predefined in the model definition.\n",
+    "\n",
+    "E.g. in the example below:\n",
+    "- `bill_no` and `total` are actually set from the value extracted from the data,\n",
+    "- there was no `tax_id` to be extracted, so the updated default we provided was applied"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "b531a20d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
+       "    <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ExtractedPageData</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">page_no</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">extracted_data</span>=<span style=\"font-weight: bold\">{</span><span style=\"color: #008000; text-decoration-color: #008000\">'bill_no'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'3139'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'total'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3949.75</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'tax_id'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'42'</span><span style=\"font-weight: bold\">}</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">raw_text</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'{\"bill_no\": \"3139\", \"total\": 3949.75, \"tax_id\": \"42\"}'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">errors</span>=<span style=\"font-weight: bold\">[]</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>\n",
+       "<span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\n",
+       "    \u001b[1;35mExtractedPageData\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mpage_no\u001b[0m=\u001b[1;36m1\u001b[0m,\n",
+       "        \u001b[33mextracted_data\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'bill_no'\u001b[0m: \u001b[32m'3139'\u001b[0m, \u001b[32m'total'\u001b[0m: \u001b[1;36m3949.75\u001b[0m, \u001b[32m'tax_id'\u001b[0m: \u001b[32m'42'\u001b[0m\u001b[1m}\u001b[0m,\n",
+       "        \u001b[33mraw_text\u001b[0m=\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"bill_no\": \"3139\", \"total\": 3949.75, \"tax_id\": \"42\"\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m,\n",
+       "        \u001b[33merrors\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m\n",
+       "\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "result = extractor.extract(\n",
+    "    source=file_path,\n",
+    "    template=Invoice(\n",
+    "        bill_no=\"41\",\n",
+    "        total=100,\n",
+    "        tax_id=\"42\",\n",
+    "    ),\n",
+    ")\n",
+    "print(result.pages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc38e143",
+   "metadata": {},
+   "source": [
+    "### Advanced Pydantic model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a1ee898",
+   "metadata": {},
+   "source": [
+    "Besides a flat template, we can in principle use any Pydantic model, thus leveraging reuse and being able to capture\n",
+    "hierarchies:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "dca8289a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Contact(BaseModel):\n",
+    "    name: Optional[str] = Field(default=None, examples=[\"Smith\"])\n",
+    "    address: str = Field(default=\"123 Main St\", examples=[\"456 Elm St\"])\n",
+    "    postal_code: str = Field(default=\"12345\", examples=[\"67890\"])\n",
+    "    city: str = Field(default=\"Anytown\", examples=[\"Othertown\"])\n",
+    "    country: Optional[str] = Field(default=None, examples=[\"Canada\"])\n",
+    "\n",
+    "\n",
+    "class ExtendedInvoice(BaseModel):\n",
+    "    bill_no: str = Field(\n",
+    "        examples=[\"A123\", \"5414\"]\n",
+    "    )  # provide some examples, but not the actual value of the test sample\n",
+    "    total: float = Field(\n",
+    "        default=10, examples=[20]\n",
+    "    )  # provide a default value and some examples\n",
+    "    garden_work_hours: int = Field(default=1, examples=[2])\n",
+    "    sender: Contact = Field(default=Contact(), examples=[Contact()])\n",
+    "    receiver: Contact = Field(default=Contact(), examples=[Contact()])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "5896662d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span>\n",
+       "    <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ExtractedPageData</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">page_no</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">extracted_data</span>=<span style=\"font-weight: bold\">{</span>\n",
+       "            <span style=\"color: #008000; text-decoration-color: #008000\">'bill_no'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'3139'</span>,\n",
+       "            <span style=\"color: #008000; text-decoration-color: #008000\">'total'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3949.75</span>,\n",
+       "            <span style=\"color: #008000; text-decoration-color: #008000\">'garden_work_hours'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">28</span>,\n",
+       "            <span style=\"color: #008000; text-decoration-color: #008000\">'sender'</span>: <span style=\"font-weight: bold\">{</span>\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'name'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Robert Schneider'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'address'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Rue du Lac 1268'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'postal_code'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'2501'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'city'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Biel'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'country'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Switzerland'</span>\n",
+       "            <span style=\"font-weight: bold\">}</span>,\n",
+       "            <span style=\"color: #008000; text-decoration-color: #008000\">'receiver'</span>: <span style=\"font-weight: bold\">{</span>\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'name'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Pia Rutschmann'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'address'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Marktgasse 28'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'postal_code'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'9400'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'city'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Rorschach'</span>,\n",
+       "                <span style=\"color: #008000; text-decoration-color: #008000\">'country'</span>: <span style=\"color: #008000; text-decoration-color: #008000\">'Switzerland'</span>\n",
+       "            <span style=\"font-weight: bold\">}</span>\n",
+       "        <span style=\"font-weight: bold\">}</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">raw_text</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'{\"bill_no\": \"3139\", \"total\": 3949.75, \"garden_work_hours\": 28, \"sender\": {\"name\": \"Robert </span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000\">Schneider\", \"address\": \"Rue du Lac 1268\", \"postal_code\": \"2501\", \"city\": \"Biel\", \"country\": \"Switzerland\"}, </span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000\">\"receiver\": {\"name\": \"Pia Rutschmann\", \"address\": \"Marktgasse 28\", \"postal_code\": \"9400\", \"city\": \"Rorschach\", </span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000\">\"country\": \"Switzerland\"}}'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">errors</span>=<span style=\"font-weight: bold\">[]</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>\n",
+       "<span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\n",
+       "    \u001b[1;35mExtractedPageData\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mpage_no\u001b[0m=\u001b[1;36m1\u001b[0m,\n",
+       "        \u001b[33mextracted_data\u001b[0m=\u001b[1m{\u001b[0m\n",
+       "            \u001b[32m'bill_no'\u001b[0m: \u001b[32m'3139'\u001b[0m,\n",
+       "            \u001b[32m'total'\u001b[0m: \u001b[1;36m3949.75\u001b[0m,\n",
+       "            \u001b[32m'garden_work_hours'\u001b[0m: \u001b[1;36m28\u001b[0m,\n",
+       "            \u001b[32m'sender'\u001b[0m: \u001b[1m{\u001b[0m\n",
+       "                \u001b[32m'name'\u001b[0m: \u001b[32m'Robert Schneider'\u001b[0m,\n",
+       "                \u001b[32m'address'\u001b[0m: \u001b[32m'Rue du Lac 1268'\u001b[0m,\n",
+       "                \u001b[32m'postal_code'\u001b[0m: \u001b[32m'2501'\u001b[0m,\n",
+       "                \u001b[32m'city'\u001b[0m: \u001b[32m'Biel'\u001b[0m,\n",
+       "                \u001b[32m'country'\u001b[0m: \u001b[32m'Switzerland'\u001b[0m\n",
+       "            \u001b[1m}\u001b[0m,\n",
+       "            \u001b[32m'receiver'\u001b[0m: \u001b[1m{\u001b[0m\n",
+       "                \u001b[32m'name'\u001b[0m: \u001b[32m'Pia Rutschmann'\u001b[0m,\n",
+       "                \u001b[32m'address'\u001b[0m: \u001b[32m'Marktgasse 28'\u001b[0m,\n",
+       "                \u001b[32m'postal_code'\u001b[0m: \u001b[32m'9400'\u001b[0m,\n",
+       "                \u001b[32m'city'\u001b[0m: \u001b[32m'Rorschach'\u001b[0m,\n",
+       "                \u001b[32m'country'\u001b[0m: \u001b[32m'Switzerland'\u001b[0m\n",
+       "            \u001b[1m}\u001b[0m\n",
+       "        \u001b[1m}\u001b[0m,\n",
+       "        \u001b[33mraw_text\u001b[0m=\u001b[32m'\u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"bill_no\": \"3139\", \"total\": 3949.75, \"garden_work_hours\": 28, \"sender\": \u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"name\": \"Robert \u001b[0m\n",
+       "\u001b[32mSchneider\", \"address\": \"Rue du Lac 1268\", \"postal_code\": \"2501\", \"city\": \"Biel\", \"country\": \"Switzerland\"\u001b[0m\u001b[32m}\u001b[0m\u001b[32m, \u001b[0m\n",
+       "\u001b[32m\"receiver\": \u001b[0m\u001b[32m{\u001b[0m\u001b[32m\"name\": \"Pia Rutschmann\", \"address\": \"Marktgasse 28\", \"postal_code\": \"9400\", \"city\": \"Rorschach\", \u001b[0m\n",
+       "\u001b[32m\"country\": \"Switzerland\"\u001b[0m\u001b[32m}\u001b[0m\u001b[32m}\u001b[0m\u001b[32m'\u001b[0m,\n",
+       "        \u001b[33merrors\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m\n",
+       "\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "result = extractor.extract(\n",
+    "    source=file_path,\n",
+    "    template=ExtendedInvoice,\n",
+    ")\n",
+    "print(result.pages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e873f65d",
+   "metadata": {},
+   "source": [
+    "### Validating and loading the extracted data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "080991f6",
+   "metadata": {},
+   "source": [
+    "The generated response data can be easily validated and loaded via Pydantic:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "a015bf60",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ExtendedInvoice</span><span style=\"font-weight: bold\">(</span>\n",
+       "    <span style=\"color: #808000; text-decoration-color: #808000\">bill_no</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'3139'</span>,\n",
+       "    <span style=\"color: #808000; text-decoration-color: #808000\">total</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3949.75</span>,\n",
+       "    <span style=\"color: #808000; text-decoration-color: #808000\">garden_work_hours</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">28</span>,\n",
+       "    <span style=\"color: #808000; text-decoration-color: #808000\">sender</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Contact</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Robert Schneider'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">address</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Rue du Lac 1268'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">postal_code</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'2501'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">city</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Biel'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">country</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Switzerland'</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>,\n",
+       "    <span style=\"color: #808000; text-decoration-color: #808000\">receiver</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Contact</span><span style=\"font-weight: bold\">(</span>\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Pia Rutschmann'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">address</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Marktgasse 28'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">postal_code</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'9400'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">city</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Rorschach'</span>,\n",
+       "        <span style=\"color: #808000; text-decoration-color: #808000\">country</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Switzerland'</span>\n",
+       "    <span style=\"font-weight: bold\">)</span>\n",
+       "<span style=\"font-weight: bold\">)</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;35mExtendedInvoice\u001b[0m\u001b[1m(\u001b[0m\n",
+       "    \u001b[33mbill_no\u001b[0m=\u001b[32m'3139'\u001b[0m,\n",
+       "    \u001b[33mtotal\u001b[0m=\u001b[1;36m3949\u001b[0m\u001b[1;36m.75\u001b[0m,\n",
+       "    \u001b[33mgarden_work_hours\u001b[0m=\u001b[1;36m28\u001b[0m,\n",
+       "    \u001b[33msender\u001b[0m=\u001b[1;35mContact\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mname\u001b[0m=\u001b[32m'Robert Schneider'\u001b[0m,\n",
+       "        \u001b[33maddress\u001b[0m=\u001b[32m'Rue du Lac 1268'\u001b[0m,\n",
+       "        \u001b[33mpostal_code\u001b[0m=\u001b[32m'2501'\u001b[0m,\n",
+       "        \u001b[33mcity\u001b[0m=\u001b[32m'Biel'\u001b[0m,\n",
+       "        \u001b[33mcountry\u001b[0m=\u001b[32m'Switzerland'\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m,\n",
+       "    \u001b[33mreceiver\u001b[0m=\u001b[1;35mContact\u001b[0m\u001b[1m(\u001b[0m\n",
+       "        \u001b[33mname\u001b[0m=\u001b[32m'Pia Rutschmann'\u001b[0m,\n",
+       "        \u001b[33maddress\u001b[0m=\u001b[32m'Marktgasse 28'\u001b[0m,\n",
+       "        \u001b[33mpostal_code\u001b[0m=\u001b[32m'9400'\u001b[0m,\n",
+       "        \u001b[33mcity\u001b[0m=\u001b[32m'Rorschach'\u001b[0m,\n",
+       "        \u001b[33mcountry\u001b[0m=\u001b[32m'Switzerland'\u001b[0m\n",
+       "    \u001b[1m)\u001b[0m\n",
+       "\u001b[1m)\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "invoice = ExtendedInvoice.model_validate(result.pages[0].extracted_data)\n",
+    "print(invoice)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae593926",
+   "metadata": {},
+   "source": [
+    "This way, we can get from completely unstructured data to a very structured and developer-friendly representation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "32844e40",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Invoice #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3139</span> was sent by Robert Schneider to Pia Rutschmann at Rue du Lac <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1268</span>.\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Invoice #\u001b[1;36m3139\u001b[0m was sent by Robert Schneider to Pia Rutschmann at Rue du Lac \u001b[1;36m1268\u001b[0m.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(\n",
+    "    f\"Invoice #{invoice.bill_no} was sent by {invoice.sender.name} \"\n",
+    "    f\"to {invoice.receiver.name} at {invoice.sender.address}.\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6c1dbe41",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/examples/index.md b/docs/examples/index.md
index b70b4b12..938b91e7 100644
--- a/docs/examples/index.md
+++ b/docs/examples/index.md
@@ -4,6 +4,7 @@ Here some of our picks to get you started:
 
 - 🔀 conversion examples ranging from [simple conversion to Markdown](./minimal.py) and export of [figures](./export_figures.py) & [tables](./export_tables.py), to [VLM](./minimal_vlm_pipeline.py) and [audio](./minimal_asr_pipeline.py) pipelines
 - 💬 various RAG examples, e.g. based on [LangChain](./rag_langchain.ipynb), [LlamaIndex](./rag_llamaindex.ipynb), or [Haystack](./rag_haystack.ipynb), including [visual grounding](./visual_grounding.ipynb), and using different vector stores like [Milvus](./rag_milvus.ipynb), [Weaviate](./rag_weaviate.ipynb), or [Qdrant](./retrieval_qdrant.ipynb)
+- 📤 [{==\[:fontawesome-solid-flask:{ title="beta feature" } beta\]==} structured data extraction](./extraction.ipynb)
 - examples for ✍️ [serialization](./serialization.ipynb) and ✂️ [chunking](./hybrid_chunking.ipynb), including [user-defined customizations](./advanced_chunking_and_serialization.ipynb)
 - 🖼️ [picture annotations](./pictures_description.ipynb) and [enrichments](./enrich_doclingdocument.py)
 
diff --git a/mkdocs.yml b/mkdocs.yml
index 5b421daa..a687fdc3 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -37,6 +37,7 @@ theme:
     - content.tabs.link
     - content.code.annotate
     - content.code.copy
+    - content.tooltips
     - announce.dismiss
     - navigation.footer
     - navigation.tabs
@@ -99,8 +100,8 @@ nav:
       - examples/serialization.ipynb
       - examples/hybrid_chunking.ipynb
       - examples/advanced_chunking_and_serialization.ipynb
-    - ✂️ Data Preparation and Embedding Pipeline:
-      - examples/dpk-ingest-chunck-tokenize.ipynb
+    - 📤 Information extraction:
+      - examples/extraction.ipynb
     - 🤖 RAG with AI dev frameworks:
       - examples/rag_haystack.ipynb
       - examples/rag_langchain.ipynb
@@ -114,6 +115,7 @@ nav:
       - "Formula enrichment": examples/develop_formula_understanding.py
       - "Enrich a DoclingDocument": examples/enrich_doclingdocument.py
     - 🗂️ More examples:
+      - examples/dpk-ingest-chunk-tokenize.ipynb
       - examples/rag_milvus.ipynb
       - examples/rag_weaviate.ipynb
       - RAG with Granite [↗]: https://github.com/ibm-granite-community/granite-snack-cookbook/blob/main/recipes/RAG/Granite_Docling_RAG.ipynb
@@ -155,6 +157,7 @@ nav:
       - CLI reference: reference/cli.md
 
 markdown_extensions:
+  - pymdownx.critic
   - pymdownx.superfences
   - pymdownx.tabbed:
       alternate_style: true