Introduction to parsing HTML files with Docling Docling

Docling simplifies document processing, parsing diverse formats — including HTML — and providing seamless integrations with the gen AI ecosystem.

Supported file formats

Docling supports multiple file formats..

Three backends for handling HTML files

Docling has three backends for parsing HTML files:

  1. HTMLDocumentBackend Ignores images
  2. HTMLDocumentBackendImagesInline Extracts images inline
  3. HTMLDocumentBackendImagesReferenced Extracts images as references