mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-30 14:04:27 +00:00
This commit transitions the Actor from using the full Docling CLI package to the more lightweight docling-serve API. Key changes include: - Redesign Dockerfile to use docling-serve as base image - Update actor.sh to communicate with API instead of running CLI commands - Improve content type handling for various output formats - Update input schema to align with API parameters - Reduce Docker image size from ~6GB to ~600MB - Update documentation and changelog to reflect architectural changes The image size reduction will make the Actor more cost-effective for users while maintaining all existing functionality including OCR capabilities. Issue: No official docling-serve Docker image is currently available, which will be addressed in a future commit. Signed-off-by: Václav Vančura <commit@vancura.dev>
1.4 KiB
1.4 KiB
Changelog
All notable changes to the Docling Actor will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.1.0] - 2025-03-15
Changed
- Switched from full Docling CLI to docling-serve API
- Dramatically reduced Docker image size (from ~6GB to ~600MB)
- Improved API compatibility with docling-serve
- Better content type handling for different output formats
- Updated error handling to align with API responses
Technical Details
- Actor Specification v1
- Using ds4sd/docling-serve:latest base image
- Node.js 20.x for Apify CLI
- Eliminated Python dependencies
- Simplified Docker build process
[1.0.0] - 2025-02-07
Added
- Initial release of Docling Actor
- Support for multiple document formats (PDF, DOCX, images)
- OCR capabilities for scanned documents
- Multiple output formats (md, json, html, text, doctags)
- Comprehensive error handling and logging
- Dataset records with processing status
- Memory monitoring and resource optimization
- Security features including non-root user execution
Technical Details
- Actor Specification v1
- Docling v2.17.0
- Python 3.11
- Node.js 20.x
- Comprehensive error codes:
- 10: Invalid input
- 11: URL inaccessible
- 12: Docling processing failed
- 13: Output file missing
- 14: Storage operation failed
- 15: OCR processing failed