docling/.actor/input_schema.json
Václav Vančura 9f86971fad Actor: Replace Docling CLI with docling-serve API
This commit transitions the Actor from using the full Docling CLI package to the more lightweight docling-serve API. Key changes include:

- Redesign Dockerfile to use docling-serve as base image
- Update actor.sh to communicate with API instead of running CLI commands
- Improve content type handling for various output formats
- Update input schema to align with API parameters
- Reduce Docker image size from ~6GB to ~600MB
- Update documentation and changelog to reflect architectural changes

The image size reduction will make the Actor more cost-effective for users while maintaining all existing functionality including OCR capabilities.

Issue: No official docling-serve Docker image is currently available, which will be addressed in a future commit.
Signed-off-by: Václav Vančura <commit@vancura.dev>
2025-03-13 10:39:22 +01:00

31 lines
986 B
JSON

{
"title": "Docling Actor Input",
"description": "Options for processing documents with Docling via the docling-serve API.",
"type": "object",
"schemaVersion": 1,
"properties": {
"documentUrl": {
"title": "Document URL",
"type": "string",
"description": "URL of the document to process. Supported formats: PDF, DOCX, PPTX, XLSX, HTML, MD, XML, images, and more.",
"prefill": "https://arxiv.org/pdf/2408.09869.pdf",
"editor": "textfield"
},
"outputFormat": {
"title": "Output Format",
"type": "string",
"description": "Desired output format after processing the document.",
"enum": ["md", "json", "html", "text", "doctags"],
"default": "md",
"editor": "select"
},
"ocr": {
"title": "Enable OCR",
"type": "boolean",
"description": "If enabled, OCR will be applied to scanned documents for text recognition.",
"default": true
}
},
"required": ["documentUrl"]
}