Merge branch 'docling-project:main' into main

This commit is contained in:
Ulan Yisaev 2025-03-19 13:27:17 +02:00 committed by GitHub
commit 34c3a395fe
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
156 changed files with 2733 additions and 1473 deletions

11
.actor/.dockerignore Normal file
View File

@ -0,0 +1,11 @@
**/__pycache__
**/*.pyc
**/*.pyo
**/*.pyd
.git
.gitignore
.env
.venv
*.log
.pytest_cache
.coverage

69
.actor/CHANGELOG.md Normal file
View File

@ -0,0 +1,69 @@
# Changelog
All notable changes to the Docling Actor will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.1.0] - 2025-03-09
### Changed
- Switched from full Docling CLI to docling-serve API
- Using the official quay.io/ds4sd/docling-serve-cpu Docker image
- Reduced Docker image size (from ~6GB to ~4GB)
- Implemented multi-stage Docker build to handle dependencies
- Improved Docker build process to ensure compatibility with docling-serve-cpu image
- Added new Python processor script for reliable API communication and content extraction
- Enhanced response handling with better content extraction logic
- Fixed ES modules compatibility issue with Apify CLI
- Added explicit tmpfs volume for temporary files
- Fixed environment variables format in actor.json
- Created optimized dependency installation approach
- Improved API compatibility with docling-serve
- Updated endpoint from custom `/convert` to standard `/v1alpha/convert/source`
- Revised JSON payload structure to match docling-serve API format
- Added proper output field parsing based on format
- Enhanced startup process with health checks
- Added configurable API host and port through environment variables
- Better content type handling for different output formats
- Updated error handling to align with API responses
### Fixed
- Fixed actor input file conflict in get_actor_input(): now checks for and removes an existing /tmp/actor-input/INPUT directory if found, ensuring valid JSON input parsing.
### Technical Details
- Actor Specification v1
- Using quay.io/ds4sd/docling-serve-cpu:latest base image
- Node.js 20.x for Apify CLI
- Eliminated Python dependencies
- Simplified Docker build process
## [1.0.0] - 2025-02-07
### Added
- Initial release of Docling Actor
- Support for multiple document formats (PDF, DOCX, images)
- OCR capabilities for scanned documents
- Multiple output formats (md, json, html, text, doctags)
- Comprehensive error handling and logging
- Dataset records with processing status
- Memory monitoring and resource optimization
- Security features including non-root user execution
### Technical Details
- Actor Specification v1
- Docling v2.17.0
- Python 3.11
- Node.js 20.x
- Comprehensive error codes:
- 10: Invalid input
- 11: URL inaccessible
- 12: Docling processing failed
- 13: Output file missing
- 14: Storage operation failed
- 15: OCR processing failed

87
.actor/Dockerfile Normal file
View File

@ -0,0 +1,87 @@
# Build stage for installing dependencies
FROM node:20-slim AS builder
# Install necessary tools and prepare dependencies environment in one layer
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /build/bin /build/lib/node_modules \
&& cp /usr/local/bin/node /build/bin/
# Set working directory
WORKDIR /build
# Create package.json and install Apify CLI in one layer
RUN echo '{"name":"docling-actor-dependencies","version":"1.0.0","description":"Dependencies for Docling Actor","private":true,"type":"module","engines":{"node":">=18"}}' > package.json \
&& npm install apify-cli@latest \
&& cp -r node_modules/* lib/node_modules/ \
&& echo '#!/bin/sh\n/tmp/docling-tools/bin/node /tmp/docling-tools/lib/node_modules/apify-cli/bin/run "$@"' > bin/actor \
&& chmod +x bin/actor \
# Clean up npm cache to reduce image size
&& npm cache clean --force
# Final stage with docling-serve-cpu
FROM quay.io/ds4sd/docling-serve-cpu:latest
LABEL maintainer="Vaclav Vancura <@vancura>" \
description="Apify Actor for document processing using Docling" \
version="1.1.0"
# Set only essential environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
DOCLING_SERVE_HOST=0.0.0.0 \
DOCLING_SERVE_PORT=5001
# Switch to root temporarily to set up directories and permissions
USER root
WORKDIR /app
# Install required tools and create directories in a single layer
RUN dnf install -y \
jq \
&& dnf clean all \
&& mkdir -p /build-files \
/tmp \
/tmp/actor-input \
/tmp/actor-output \
/tmp/actor-storage \
/tmp/apify_input \
/apify_input \
/opt/app-root/src/.EasyOCR/user_network \
/tmp/easyocr-models \
&& chown 1000:1000 /build-files \
&& chown -R 1000:1000 /opt/app-root/src/.EasyOCR \
&& chmod 1777 /tmp \
&& chmod 1777 /tmp/easyocr-models \
&& chmod 777 /tmp/actor-input /tmp/actor-output /tmp/actor-storage /tmp/apify_input /apify_input \
# Fix for uv_os_get_passwd error in Node.js
&& echo "docling:x:1000:1000:Docling User:/app:/bin/sh" >> /etc/passwd
# Set environment variable to tell EasyOCR to use a writable location for models
ENV EASYOCR_MODULE_PATH=/tmp/easyocr-models
# Copy only required files
COPY --chown=1000:1000 .actor/actor.sh .actor/actor.sh
COPY --chown=1000:1000 .actor/actor.json .actor/actor.json
COPY --chown=1000:1000 .actor/input_schema.json .actor/input_schema.json
COPY --chown=1000:1000 .actor/docling_processor.py .actor/docling_processor.py
RUN chmod +x .actor/actor.sh
# Copy the build files from builder
COPY --from=builder --chown=1000:1000 /build /build-files
# Switch to non-root user
USER 1000
# Set up TMPFS for temporary files
VOLUME ["/tmp"]
# Create additional volumes for OCR models persistence
VOLUME ["/tmp/easyocr-models"]
# Expose the docling-serve API port
EXPOSE 5001
# Run the actor script
ENTRYPOINT [".actor/actor.sh"]

314
.actor/README.md Normal file
View File

@ -0,0 +1,314 @@
# Docling Actor on Apify
[![Docling Actor](https://apify.com/actor-badge?actor=vancura/docling?fpr=docling)](https://apify.com/vancura/docling)
This Actor (specification v1) wraps the [Docling project](https://ds4sd.github.io/docling/) to provide serverless document processing in the cloud. It can process complex documents (PDF, DOCX, images) and convert them into structured formats (Markdown, JSON, HTML, Text, or DocTags) with optional OCR support.
## What are Actors?
[Actors](https://docs.apify.com/platform/actors?fpr=docling) are serverless microservices running on the [Apify Platform](https://apify.com/?fpr=docling). They are based on the [Actor SDK](https://docs.apify.com/sdk/js?fpr=docling) and can be found in the [Apify Store](https://apify.com/store?fpr=docling). Learn more about Actors in the [Apify Whitepaper](https://whitepaper.actor?fpr=docling).
## Table of Contents
1. [Features](#features)
2. [Usage](#usage)
3. [Input Parameters](#input-parameters)
4. [Output](#output)
5. [Performance & Resources](#performance--resources)
6. [Troubleshooting](#troubleshooting)
7. [Local Development](#local-development)
8. [Architecture](#architecture)
9. [License](#license)
10. [Acknowledgments](#acknowledgments)
11. [Security Considerations](#security-considerations)
## Features
- Leverages the official docling-serve-cpu Docker image for efficient document processing
- Processes multiple document formats:
- PDF documents (scanned or digital)
- Microsoft Office files (DOCX, XLSX, PPTX)
- Images (PNG, JPG, TIFF)
- Other text-based formats
- Provides OCR capabilities for scanned documents
- Exports to multiple formats:
- Markdown
- JSON
- HTML
- Plain Text
- DocTags (structured format)
- No local setup needed—just provide input via a simple JSON config
## Usage
### Using Apify Console
1. Go to the Apify Actor page.
2. Click "Run".
3. In the input form, fill in:
- The URL of the document.
- Output format (`md`, `json`, `html`, `text`, or `doctags`).
- OCR boolean toggle.
4. The Actor will run and produce its outputs in the default key-value store under the key `OUTPUT`.
### Using Apify API
```bash
curl --request POST \
--url "https://api.apify.com/v2/acts/vancura~docling/run" \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_TOKEN' \
--data '{
"options": {
"to_formats": ["md", "json", "html", "text", "doctags"]
},
"http_sources": [
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
{"url": "https://arxiv.org/pdf/2408.09869"}
]
}'
```
### Using Apify CLI
```bash
apify call vancura/docling --input='{
"options": {
"to_formats": ["md", "json", "html", "text", "doctags"]
},
"http_sources": [
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
{"url": "https://arxiv.org/pdf/2408.09869"}
]
}'
```
## Input Parameters
The Actor accepts a JSON schema matching the file `.actor/input_schema.json`. Below is a summary of the fields:
| Field | Type | Required | Default | Description |
|----------------|---------|----------|----------|-------------------------------------------------------------------------------|
| `http_sources` | object | Yes | None | https://github.com/DS4SD/docling-serve?tab=readme-ov-file#url-endpoint |
| `options` | object | No | None | https://github.com/DS4SD/docling-serve?tab=readme-ov-file#common-parameters |
### Example Input
```json
{
"options": {
"to_formats": ["md", "json", "html", "text", "doctags"]
},
"http_sources": [
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
{"url": "https://arxiv.org/pdf/2408.09869"}
]
}
```
## Output
The Actor provides three types of outputs:
1. **Processed Documents in a ZIP** - The Actor will provide the direct URL to your result in the run log, looking like:
```text
You can find your results at: 'https://api.apify.com/v2/key-value-stores/[YOUR_STORE_ID]/records/OUTPUT'
```
2. **Processing Log** - Available in the key-value store as `DOCLING_LOG`
3. **Dataset Record** - Contains processing metadata with:
- Direct link to the processed output zip file
- Processing status
You can access the results in several ways:
1. **Direct URL** (shown in Actor run logs):
```text
https://api.apify.com/v2/key-value-stores/[STORE_ID]/records/OUTPUT
```
2. **Programmatically** via Apify CLI:
```bash
apify key-value-stores get-value OUTPUT
```
3. **Dataset** - Check the "Dataset" tab in the Actor run details to see processing metadata
### Example Outputs
#### Markdown (md)
```markdown
# Document Title
## Section 1
Content of section 1...
## Section 2
Content of section 2...
```
#### JSON
```json
{
"title": "Document Title",
"sections": [
{
"level": 1,
"title": "Section 1",
"content": "Content of section 1..."
}
]
}
```
#### HTML
```html
<h1>Document Title</h1>
<h2>Section 1</h2>
<p>Content of section 1...</p>
```
### Processing Logs (`DOCLING_LOG`)
The Actor maintains detailed processing logs including:
- API request and response details
- Processing steps and timing
- Error messages and stack traces
- Input validation results
Access logs via:
```bash
apify key-value-stores get-record DOCLING_LOG
```
## Performance & Resources
- **Docker Image Size**: ~4GB
- **Memory Requirements**:
- Minimum: 2 GB RAM
- Recommended: 4 GB RAM for large or complex documents
- **Processing Time**:
- Simple documents: 15-30 seconds
- Complex PDFs with OCR: 1-3 minutes
- Large documents (100+ pages): 3-10 minutes
## Troubleshooting
Common issues and solutions:
1. **Document URL Not Accessible**
- Ensure the URL is publicly accessible
- Check if the document requires authentication
- Verify the URL leads directly to the document
2. **OCR Processing Fails**
- Verify the document is not password-protected
- Check if the image quality is sufficient
- Try processing with OCR disabled
3. **API Response Issues**
- Check the logs for detailed error messages
- Ensure the document format is supported
- Verify the URL is correctly formatted
4. **Output Format Issues**
- Verify the output format is supported
- Check if the document structure is compatible
- Review the `DOCLING_LOG` for specific errors
### Error Handling
The Actor implements comprehensive error handling:
- Detailed error messages in `DOCLING_LOG`
- Proper exit codes for different failure scenarios
- Automatic cleanup on failure
- Dataset records with processing status
## Local Development
If you wish to develop or modify this Actor locally:
1. Clone the repository.
2. Ensure Docker is installed.
3. The Actor files are located in the `.actor` directory:
- `Dockerfile` - Defines the container environment
- `actor.json` - Actor configuration and metadata
- `actor.sh` - Main execution script that starts the docling-serve API and orchestrates document processing
- `input_schema.json` - Input parameter definitions
- `dataset_schema.json` - Dataset output format definition
- `CHANGELOG.md` - Change log documenting all notable changes
- `README.md` - This documentation
4. Run the Actor locally using:
```bash
apify run
```
### Actor Structure
```text
.actor/
├── Dockerfile # Container definition
├── actor.json # Actor metadata
├── actor.sh # Execution script (also starts docling-serve API)
├── input_schema.json # Input parameters
├── dataset_schema.json # Dataset output format definition
├── docling_processor.py # Python script for API communication
├── CHANGELOG.md # Version history and changes
└── README.md # This documentation
```
## Architecture
This Actor uses a lightweight architecture based on the official `quay.io/ds4sd/docling-serve-cpu` Docker image:
- **Base Image**: `quay.io/ds4sd/docling-serve-cpu:latest` (~4GB)
- **Multi-Stage Build**: Uses a multi-stage Docker build to include only necessary tools
- **API Communication**: Uses the RESTful API provided by docling-serve
- **Request Flow**:
1. The actor script starts the docling-serve API on port 5001
2. Performs health checks to ensure the API is running
3. Processes the input parameters
4. Creates a JSON payload for the docling-serve API with proper format:
```json
{
"options": {
"to_formats": ["md"],
"do_ocr": true
},
"http_sources": [{"url": "https://example.com/document.pdf"}]
}
```
5. Makes a POST request to the `/v1alpha/convert/source` endpoint
6. Processes the response and stores it in the key-value store
- **Dependencies**:
- Node.js for Apify CLI
- Essential tools (curl, jq, etc.) copied from build stage
- **Security**: Runs as a non-root user for enhanced security
## License
This wrapper project is under the MIT License, matching the original Docling license. See [LICENSE](../LICENSE) for details.
## Acknowledgments
- [Docling](https://ds4sd.github.io/docling/) and [docling-serve-cpu](https://quay.io/repository/ds4sd/docling-serve-cpu) by IBM
- [Apify](https://apify.com/?fpr=docling) for the serverless actor environment
## Security Considerations
- Actor runs under a non-root user for enhanced security
- Input URLs are validated before processing
- Temporary files are securely managed and cleaned up
- Process isolation through Docker containerization
- Secure handling of processing artifacts

11
.actor/actor.json Normal file
View File

@ -0,0 +1,11 @@
{
"actorSpecification": 1,
"name": "docling",
"version": "0.0",
"environmentVariables": {},
"dockerFile": "./Dockerfile",
"input": "./input_schema.json",
"scripts": {
"run": "./actor.sh"
}
}

419
.actor/actor.sh Executable file
View File

@ -0,0 +1,419 @@
#!/bin/bash
export PATH=$PATH:/build-files/node_modules/.bin
# Function to upload content to the key-value store
upload_to_kvs() {
local content_file="$1"
local key_name="$2"
local content_type="$3"
local description="$4"
# Find the Apify CLI command
find_apify_cmd
local apify_cmd="$FOUND_APIFY_CMD"
if [ -n "$apify_cmd" ]; then
echo "Uploading $description to key-value store (key: $key_name)..."
# Create a temporary home directory with write permissions
setup_temp_environment
# Use the --no-update-notifier flag if available
if $apify_cmd --help | grep -q "\--no-update-notifier"; then
if $apify_cmd --no-update-notifier actor:set-value "$key_name" --contentType "$content_type" < "$content_file"; then
echo "Successfully uploaded $description to key-value store"
local url="https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/$key_name"
echo "$description available at: $url"
cleanup_temp_environment
return 0
fi
else
# Fall back to regular command if flag isn't available
if $apify_cmd actor:set-value "$key_name" --contentType "$content_type" < "$content_file"; then
echo "Successfully uploaded $description to key-value store"
local url="https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/$key_name"
echo "$description available at: $url"
cleanup_temp_environment
return 0
fi
fi
echo "ERROR: Failed to upload $description to key-value store"
cleanup_temp_environment
return 1
else
echo "ERROR: Apify CLI not found for $description upload"
return 1
fi
}
# Function to find Apify CLI command
find_apify_cmd() {
FOUND_APIFY_CMD=""
for cmd in "apify" "actor" "/usr/local/bin/apify" "/usr/bin/apify" "/opt/apify/cli/bin/apify"; do
if command -v "$cmd" &> /dev/null; then
FOUND_APIFY_CMD="$cmd"
break
fi
done
}
# Function to set up temporary environment for Apify CLI
setup_temp_environment() {
export TMPDIR="/tmp/apify-home-${RANDOM}"
mkdir -p "$TMPDIR"
export APIFY_DISABLE_VERSION_CHECK=1
export NODE_OPTIONS="--no-warnings"
export HOME="$TMPDIR" # Override home directory to writable location
}
# Function to clean up temporary environment
cleanup_temp_environment() {
rm -rf "$TMPDIR" 2>/dev/null || true
}
# Function to push data to Apify dataset
push_to_dataset() {
# Example usage: push_to_dataset "$RESULT_URL" "$OUTPUT_SIZE" "zip"
local result_url="$1"
local size="$2"
local format="$3"
# Find Apify CLI command
find_apify_cmd
local apify_cmd="$FOUND_APIFY_CMD"
if [ -n "$apify_cmd" ]; then
echo "Adding record to dataset..."
setup_temp_environment
# Use the --no-update-notifier flag if available
if $apify_cmd --help | grep -q "\--no-update-notifier"; then
if $apify_cmd --no-update-notifier actor:push-data "{\"output_file\": \"${result_url}\", \"format\": \"${format}\", \"size\": \"${size}\", \"status\": \"success\"}"; then
echo "Successfully added record to dataset"
else
echo "Warning: Failed to add record to dataset"
fi
else
# Fall back to regular command
if $apify_cmd actor:push-data "{\"output_file\": \"${result_url}\", \"format\": \"${format}\", \"size\": \"${size}\", \"status\": \"success\"}"; then
echo "Successfully added record to dataset"
else
echo "Warning: Failed to add record to dataset"
fi
fi
cleanup_temp_environment
fi
}
# --- Setup logging and error handling ---
LOG_FILE="/tmp/docling.log"
touch "$LOG_FILE" || {
echo "Fatal: Cannot create log file at $LOG_FILE"
exit 1
}
# Log to both console and file
exec 1> >(tee -a "$LOG_FILE")
exec 2> >(tee -a "$LOG_FILE" >&2)
# Exit codes
readonly ERR_API_UNAVAILABLE=15
readonly ERR_INVALID_INPUT=16
# --- Debug environment ---
echo "Date: $(date)"
echo "Python version: $(python --version 2>&1)"
echo "Docling-serve path: $(which docling-serve 2>/dev/null || echo 'Not found')"
echo "Working directory: $(pwd)"
# --- Get input ---
echo "Getting Apify Actor Input"
INPUT=$(apify actor get-input 2>/dev/null)
# --- Setup tools ---
echo "Setting up tools..."
TOOLS_DIR="/tmp/docling-tools"
mkdir -p "$TOOLS_DIR"
# Copy tools if available
if [ -d "/build-files" ]; then
echo "Copying tools from /build-files..."
cp -r /build-files/* "$TOOLS_DIR/"
export PATH="$TOOLS_DIR/bin:$PATH"
else
echo "Warning: No build files directory found. Some tools may be unavailable."
fi
# Copy Python processor script to tools directory
PYTHON_SCRIPT_PATH="$(dirname "$0")/docling_processor.py"
if [ -f "$PYTHON_SCRIPT_PATH" ]; then
echo "Copying Python processor script to tools directory..."
cp "$PYTHON_SCRIPT_PATH" "$TOOLS_DIR/"
chmod +x "$TOOLS_DIR/docling_processor.py"
else
echo "ERROR: Python processor script not found at $PYTHON_SCRIPT_PATH"
exit 1
fi
# Check OCR directories and ensure they're writable
echo "Checking OCR directory permissions..."
OCR_DIR="/opt/app-root/src/.EasyOCR"
if [ -d "$OCR_DIR" ]; then
# Test if we can write to the directory
if touch "$OCR_DIR/test_write" 2>/dev/null; then
echo "[✓] OCR directory is writable"
rm "$OCR_DIR/test_write"
else
echo "[✗] OCR directory is not writable, setting up alternative in /tmp"
# Create alternative in /tmp (which is writable)
mkdir -p "/tmp/.EasyOCR/user_network"
export EASYOCR_MODULE_PATH="/tmp/.EasyOCR"
fi
else
echo "OCR directory not found, creating in /tmp"
mkdir -p "/tmp/.EasyOCR/user_network"
export EASYOCR_MODULE_PATH="/tmp/.EasyOCR"
fi
# --- Starting the API ---
echo "Starting docling-serve API..."
# Create a dedicated working directory in /tmp (writable)
API_DIR="/tmp/docling-api"
mkdir -p "$API_DIR"
cd "$API_DIR"
echo "API working directory: $(pwd)"
# Find docling-serve executable
DOCLING_SERVE_PATH=$(which docling-serve)
echo "Docling-serve executable: $DOCLING_SERVE_PATH"
# Start the API with minimal parameters to avoid any issues
echo "Starting docling-serve API..."
"$DOCLING_SERVE_PATH" run --host 0.0.0.0 --port 5001 > "$API_DIR/docling-serve.log" 2>&1 &
API_PID=$!
echo "Started docling-serve API with PID: $API_PID"
# A more reliable wait for API startup
echo "Waiting for API to initialize..."
MAX_TRIES=30
tries=0
started=false
while [ $tries -lt $MAX_TRIES ]; do
tries=$((tries + 1))
# Check if process is still running
if ! ps -p $API_PID > /dev/null; then
echo "ERROR: docling-serve API process terminated unexpectedly after $tries seconds"
break
fi
# Check log for startup completion or errors
if grep -q "Application startup complete" "$API_DIR/docling-serve.log" 2>/dev/null; then
echo "[✓] API startup completed successfully after $tries seconds"
started=true
break
fi
if grep -q "Permission denied\|PermissionError" "$API_DIR/docling-serve.log" 2>/dev/null; then
echo "ERROR: Permission errors detected in API startup"
break
fi
# Sleep and check again
sleep 1
# Output a progress indicator every 5 seconds
if [ $((tries % 5)) -eq 0 ]; then
echo "Still waiting for API startup... ($tries/$MAX_TRIES seconds)"
fi
done
# Show log content regardless of outcome
echo "docling-serve log output so far:"
tail -n 20 "$API_DIR/docling-serve.log"
# Verify the API is running
if ! ps -p $API_PID > /dev/null; then
echo "ERROR: docling-serve API failed to start"
if [ -f "$API_DIR/docling-serve.log" ]; then
echo "Full log output:"
cat "$API_DIR/docling-serve.log"
fi
exit $ERR_API_UNAVAILABLE
fi
if [ "$started" != "true" ]; then
echo "WARNING: API process is running but startup completion was not detected"
echo "Will attempt to continue anyway..."
fi
# Try to verify API is responding at this point
echo "Verifying API responsiveness..."
(python -c "
import sys, time, socket
for i in range(5):
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(1)
result = s.connect_ex(('localhost', 5001))
if result == 0:
s.close()
print('Port 5001 is open and accepting connections')
sys.exit(0)
s.close()
except Exception as e:
pass
time.sleep(1)
print('Could not connect to API port after 5 attempts')
sys.exit(1)
" && echo "API verification succeeded") || echo "API verification failed, but continuing anyway"
# Define API endpoint
DOCLING_API_ENDPOINT="http://localhost:5001/v1alpha/convert/source"
# --- Processing document ---
echo "Starting document processing..."
echo "Reading input from Apify..."
echo "Input content:" >&2
echo "$INPUT" >&2 # Send the raw input to stderr for debugging
echo "$INPUT" # Send the clean JSON to stdout for processing
# Create the request JSON
REQUEST_JSON=$(echo $INPUT | jq '.options += {"return_as_file": true}')
echo "Creating request JSON:" >&2
echo "$REQUEST_JSON" >&2
echo "$REQUEST_JSON" > "$API_DIR/request.json"
# Send the conversion request using our Python script
#echo "Sending conversion request to docling-serve API..."
#python "$TOOLS_DIR/docling_processor.py" \
# --api-endpoint "$DOCLING_API_ENDPOINT" \
# --request-json "$API_DIR/request.json" \
# --output-dir "$API_DIR" \
# --output-format "$OUTPUT_FORMAT"
echo "Curl the Docling API"
curl -s -H "content-type: application/json" -X POST --data-binary @$API_DIR/request.json -o $API_DIR/output.zip $DOCLING_API_ENDPOINT
CURL_EXIT_CODE=$?
# --- Check for various potential output files ---
echo "Checking for output files..."
if [ -f "$API_DIR/output.zip" ]; then
echo "Conversion completed successfully! Output file found."
# Get content from the converted file
OUTPUT_SIZE=$(wc -c < "$API_DIR/output.zip")
echo "Output file found with size: $OUTPUT_SIZE bytes"
# Calculate the access URL for result display
RESULT_URL="https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/OUTPUT"
echo "=============================="
echo "PROCESSING COMPLETE!"
echo "Output size: ${OUTPUT_SIZE} bytes"
echo "=============================="
# Set the output content type based on format
CONTENT_TYPE="application/zip"
# Upload the document content using our function
upload_to_kvs "$API_DIR/output.zip" "OUTPUT" "$CONTENT_TYPE" "Document content"
# Only proceed with dataset record if document upload succeeded
if [ $? -eq 0 ]; then
echo "Your document is available at: ${RESULT_URL}"
echo "=============================="
# Push data to dataset
push_to_dataset "$RESULT_URL" "$OUTPUT_SIZE" "zip"
fi
else
echo "ERROR: No converted output file found at $API_DIR/output.zip"
# Create error metadata
ERROR_METADATA="{\"status\":\"error\",\"error\":\"No converted output file found\",\"documentUrl\":\"$DOCUMENT_URL\"}"
echo "$ERROR_METADATA" > "/tmp/actor-output/OUTPUT"
chmod 644 "/tmp/actor-output/OUTPUT"
echo "Error information has been saved to /tmp/actor-output/OUTPUT"
fi
# --- Verify output files for debugging ---
echo "=== Final Output Verification ==="
echo "Files in /tmp/actor-output:"
ls -la /tmp/actor-output/ 2>/dev/null || echo "Cannot list /tmp/actor-output/"
echo "All operations completed. The output should be available in the default key-value store."
echo "Content URL: ${RESULT_URL:-No URL available}"
# --- Cleanup function ---
cleanup() {
echo "Running cleanup..."
# Stop the API process
if [ -n "$API_PID" ]; then
echo "Stopping docling-serve API (PID: $API_PID)..."
kill $API_PID 2>/dev/null || true
fi
# Export log file to KVS if it exists
# DO THIS BEFORE REMOVING TOOLS DIRECTORY
if [ -f "$LOG_FILE" ]; then
if [ -s "$LOG_FILE" ]; then
echo "Log file is not empty, pushing to key-value store (key: LOG)..."
# Upload log using our function
upload_to_kvs "$LOG_FILE" "LOG" "text/plain" "Log file"
else
echo "Warning: log file exists but is empty"
fi
else
echo "Warning: No log file found"
fi
# Clean up temporary files AFTER log is uploaded
echo "Cleaning up temporary files..."
if [ -d "$API_DIR" ]; then
echo "Removing API working directory: $API_DIR"
rm -rf "$API_DIR" 2>/dev/null || echo "Warning: Failed to remove $API_DIR"
fi
if [ -d "$TOOLS_DIR" ]; then
echo "Removing tools directory: $TOOLS_DIR"
rm -rf "$TOOLS_DIR" 2>/dev/null || echo "Warning: Failed to remove $TOOLS_DIR"
fi
# Keep log file until the very end
echo "Script execution completed at $(date)"
echo "Actor execution completed"
}
# Register cleanup
trap cleanup EXIT

View File

@ -0,0 +1,31 @@
{
"title": "Docling Actor Dataset",
"description": "Records of document processing results from the Docling Actor",
"type": "object",
"schemaVersion": 1,
"properties": {
"url": {
"title": "Document URL",
"type": "string",
"description": "URL of the processed document"
},
"output_file": {
"title": "Result URL",
"type": "string",
"description": "Direct URL to the processed result in key-value store"
},
"status": {
"title": "Processing Status",
"type": "string",
"description": "Status of the document processing",
"enum": ["success", "error"]
},
"error": {
"title": "Error Details",
"type": "string",
"description": "Error message if processing failed",
"optional": true
}
},
"required": ["url", "output_file", "status"]
}

27
.actor/input_schema.json Normal file
View File

@ -0,0 +1,27 @@
{
"title": "Docling Actor Input",
"description": "Options for processing documents with Docling via the docling-serve API.",
"type": "object",
"schemaVersion": 1,
"properties": {
"http_sources": {
"title": "Document URLs",
"type": "array",
"description": "URLs of documents to process. Supported formats: PDF, DOCX, PPTX, XLSX, HTML, MD, XML, images, and more.",
"editor": "json",
"prefill": [
{ "url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf" }
]
},
"options": {
"title": "Processing Options",
"type": "object",
"description": "Document processing configuration options",
"editor": "json",
"prefill": {
"to_formats": ["md"]
}
}
},
"required": ["options", "http_sources"]
}

View File

@ -1,3 +1,22 @@
## [v2.27.0](https://github.com/docling-project/docling/releases/tag/v2.27.0) - 2025-03-18
### Feature
* Add factory for ocr engines via plugins ([#1010](https://github.com/docling-project/docling/issues/1010)) ([`6eaae3c`](https://github.com/docling-project/docling/commit/6eaae3cba034599020dc06ebdad3bc3ff0b5a8eb))
* Add DoclingParseV4 backend, using high-level docling-parse API ([#905](https://github.com/docling-project/docling/issues/905)) ([`3960b19`](https://github.com/docling-project/docling/commit/3960b199d63d0e9d660aeb0cbced02b38bb0b593))
* **actor:** Docling Actor on Apify infrastructure ([#875](https://github.com/docling-project/docling/issues/875)) ([`772487f`](https://github.com/docling-project/docling/commit/772487f9c91ad2ee53c591c314c72443f9cbfd23))
* Equations to latex in MSWord backend (with inline groups) ([#1114](https://github.com/docling-project/docling/issues/1114)) ([`6eb718f`](https://github.com/docling-project/docling/commit/6eb718f8493038d1b4b6ae836df5a24aa13cd17e))
### Fix
* **html:** Handle nested empty lists ([#1154](https://github.com/docling-project/docling/issues/1154)) ([`f94da44`](https://github.com/docling-project/docling/commit/f94da44ec5c7a8c92b9dd60e4df5dc945ed6d1ea))
* Use first table row as col headers ([#1156](https://github.com/docling-project/docling/issues/1156)) ([`0945973`](https://github.com/docling-project/docling/commit/0945973b79d67b74281aba5102ee985ac1de74ea))
* Pass tests, update docling-core to 2.22.0 ([#1150](https://github.com/docling-project/docling/issues/1150)) ([`aa92a57`](https://github.com/docling-project/docling/commit/aa92a57fa9e7228e894efb9050a0cdb9f287ebfd))
### Documentation
* Fix spelling of picture in usage ([#1165](https://github.com/docling-project/docling/issues/1165)) ([`7e01798`](https://github.com/docling-project/docling/commit/7e01798417c424c05685e0ff5f6f89f70dc3bfcd))
## [v2.26.0](https://github.com/docling-project/docling/releases/tag/v2.26.0) - 2025-03-11
### Feature

View File

@ -1,129 +1,3 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement using
[deepsearch-core@zurich.ibm.com](mailto:deepsearch-core@zurich.ibm.com).
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html](https://www.contributor-covenant.org/version/2/0/code_of_conduct.html).
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
Homepage: [https://www.contributor-covenant.org](https://www.contributor-covenant.org)
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq](https://www.contributor-covenant.org/faq). Translations are available at
[https://www.contributor-covenant.org/translations](https://www.contributor-covenant.org/translations).
This project adheres to the [Docling - Code of Conduct and Covenant](https://github.com/docling-project/community/blob/main/CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.

View File

@ -2,85 +2,7 @@
Our project welcomes external contributions. If you have an itch, please feel
free to scratch it.
To contribute code or documentation, please submit a [pull request](https://github.com/docling-project/docling/pulls).
A good way to familiarize yourself with the codebase and contribution process is
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/docling-project/docling/issues).
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
For general questions or support requests, please refer to the [discussion section](https://github.com/docling-project/docling/discussions).
**Note: We appreciate your effort and want to avoid situations where a contribution
requires extensive rework (by you or by us), sits in the backlog for a long time, or
cannot be accepted at all!**
### Proposing New Features
If you would like to implement a new feature, please [raise an issue](https://github.com/docling-project/docling/issues)
before sending a pull request so the feature can be discussed. This is to avoid
you spending valuable time working on a feature that the project developers
are not interested in accepting into the codebase.
### Fixing Bugs
If you would like to fix a bug, please [raise an issue](https://github.com/docling-project/docling/issues) before sending a
pull request so it can be tracked.
### Merge Approval
The project maintainers use LGTM (Looks Good To Me) in comments on the code
review to indicate acceptance. A change requires LGTMs from two of the
maintainers of each component affected.
For a list of the maintainers, see the [MAINTAINERS.md](MAINTAINERS.md) page.
## Legal
Each source file must include a license header for the MIT
Software. Using the SPDX format is the simplest approach,
e.g.
```
/*
Copyright IBM Inc. All rights reserved.
SPDX-License-Identifier: MIT
*/
```
We have tried to make it as easy as possible to make contributions. This
applies to how we handle the legal aspects of contribution. We use the
same approach - the [Developer's Certificate of Origin 1.1 (DCO)](https://github.com/hyperledger/fabric/blob/master/docs/source/DCO1.1.txt) - that the Linux® Kernel [community](https://elinux.org/Developer_Certificate_Of_Origin)
uses to manage code contributions.
We simply ask that when submitting a patch for review, the developer
must include a sign-off statement in the commit message.
Here is an example Signed-off-by line, which indicates that the
submitter accepts the DCO:
```
Signed-off-by: John Doe <john.doe@example.com>
```
You can include this automatically when you commit a change to your
local git repository using the following command:
```
git commit -s
```
### New dependencies
This project strictly adheres to using dependencies that are compatible with the MIT license to ensure maximum flexibility and permissiveness in its usage and distribution. As a result, dependencies licensed under restrictive terms such as GPL, LGPL, AGPL, or similar are explicitly excluded. These licenses impose additional requirements and limitations that are incompatible with the MIT license's minimal restrictions, potentially affecting derivative works and redistribution. By maintaining this policy, the project ensures simplicity and freedom for both developers and users, avoiding conflicts with stricter copyleft provisions.
## Communication
Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
For more details on the contributing guidelines head to the Docling Project [community repository](https://github.com/docling-project/community).
## Developing

View File

@ -2,9 +2,6 @@
- Christoph Auer - [@cau-git](https://github.com/cau-git)
- Michele Dolfi - [@dolfim-ibm](https://github.com/dolfim-ibm)
- Maxim Lysak - [@maxmnemonic](https://github.com/maxmnemonic)
- Nikos Livathinos - [@nikos-livathinos](https://github.com/nikos-livathinos)
- Ahmed Nassar - [@nassarofficial](https://github.com/nassarofficial)
- Panos Vagenas - [@vagenas](https://github.com/vagenas)
- Peter Staar - [@PeterStaar-IBM](https://github.com/PeterStaar-IBM)

View File

@ -21,6 +21,8 @@
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![License MIT](https://img.shields.io/github/license/docling-project/docling)](https://opensource.org/licenses/MIT)
[![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling)
[![Docling Actor](https://apify.com/actor-badge?actor=vancura/docling?fpr=docling)](https://apify.com/vancura/docling)
[![LF AI & Data](https://img.shields.io/badge/LF%20AI%20%26%20Data-003778?logo=linuxfoundation&logoColor=fff&color=0094ff&labelColor=003778)](https://lfaidata.foundation/projects/)
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
@ -33,12 +35,12 @@ Docling simplifies document processing, parsing diverse formats — including ad
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
* 🔍 Extensive OCR support for scanned PDFs and images
* 🥚 Support of Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
* 💻 Simple and convenient CLI
### Coming soon
* 📝 Metadata extraction, including title, authors, references & language
* 📝 Inclusion of Visual Language Models ([SmolDocling](https://huggingface.co/blog/smolervlm#smoldocling))
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
* 📝 Complex chemistry understanding (Molecular structures)
@ -119,9 +121,13 @@ If you use Docling in your projects, please consider citing the following:
The Docling codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.
## IBM ❤️ Open Source AI
## LF AI & Data
Docling has been brought to you by IBM.
Docling is hosted as a project in the [LF AI & Data Foundation](https://lfaidata.foundation/projects/).
### IBM ❤️ Open Source AI
The project was started by the AI for knowledge team at IBM Research Zurich.
[supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
[docling_document]: https://docling-project.github.io/docling/concepts/docling_document/

View File

@ -6,12 +6,12 @@ from typing import Iterable, List, Optional, Union
import pypdfium2 as pdfium
from docling_core.types.doc import BoundingBox, CoordOrigin, Size
from docling_core.types.doc.page import BoundingRectangle, SegmentedPdfPage, TextCell
from docling_parse.pdf_parsers import pdf_parser_v1
from PIL import Image, ImageDraw
from pypdfium2 import PdfPage
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
from docling.datamodel.base_models import Cell
from docling.datamodel.document import InputDocument
_log = logging.getLogger(__name__)
@ -68,8 +68,11 @@ class DoclingParsePageBackend(PdfPageBackend):
return text_piece
def get_text_cells(self) -> Iterable[Cell]:
cells: List[Cell] = []
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
return None
def get_text_cells(self) -> Iterable[TextCell]:
cells: List[TextCell] = []
cell_counter = 0
if not self.valid:
@ -91,19 +94,24 @@ class DoclingParsePageBackend(PdfPageBackend):
text_piece = self._dpage["cells"][i]["content"]["rnormalized"]
cells.append(
Cell(
id=cell_counter,
TextCell(
index=cell_counter,
text=text_piece,
bbox=BoundingBox(
orig=text_piece,
from_ocr=False,
rect=BoundingRectangle.from_bounding_box(
BoundingBox(
# l=x0, b=y0, r=x1, t=y1,
l=x0 * page_size.width / parser_width,
b=y0 * page_size.height / parser_height,
r=x1 * page_size.width / parser_width,
t=y1 * page_size.height / parser_height,
coord_origin=CoordOrigin.BOTTOMLEFT,
)
).to_top_left_origin(page_size.height),
)
)
cell_counter += 1
def draw_clusters_and_cells():
@ -112,7 +120,7 @@ class DoclingParsePageBackend(PdfPageBackend):
) # make new image to avoid drawing on the saved ones
draw = ImageDraw.Draw(image)
for c in cells:
x0, y0, x1, y1 = c.bbox.as_tuple()
x0, y0, x1, y1 = c.rect.to_bounding_box().as_tuple()
cell_color = (
random.randint(30, 140),
random.randint(30, 140),

View File

@ -6,12 +6,13 @@ from typing import TYPE_CHECKING, Iterable, List, Optional, Union
import pypdfium2 as pdfium
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, SegmentedPdfPage, TextCell
from docling_parse.pdf_parsers import pdf_parser_v2
from PIL import Image, ImageDraw
from pypdfium2 import PdfPage
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
from docling.datamodel.base_models import Cell, Size
from docling.datamodel.base_models import Size
from docling.utils.locks import pypdfium2_lock
if TYPE_CHECKING:
@ -78,8 +79,11 @@ class DoclingParseV2PageBackend(PdfPageBackend):
return text_piece
def get_text_cells(self) -> Iterable[Cell]:
cells: List[Cell] = []
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
return None
def get_text_cells(self) -> Iterable[TextCell]:
cells: List[TextCell] = []
cell_counter = 0
if not self.valid:
@ -106,16 +110,20 @@ class DoclingParseV2PageBackend(PdfPageBackend):
text_piece = cell_data[cells_header.index("text")]
cells.append(
Cell(
id=cell_counter,
TextCell(
index=cell_counter,
text=text_piece,
bbox=BoundingBox(
orig=text_piece,
from_ocr=False,
rect=BoundingRectangle.from_bounding_box(
BoundingBox(
# l=x0, b=y0, r=x1, t=y1,
l=x0 * page_size.width / parser_width,
b=y0 * page_size.height / parser_height,
r=x1 * page_size.width / parser_width,
t=y1 * page_size.height / parser_height,
coord_origin=CoordOrigin.BOTTOMLEFT,
)
).to_top_left_origin(page_size.height),
)
)

View File

@ -0,0 +1,192 @@
import logging
import random
from io import BytesIO
from pathlib import Path
from typing import TYPE_CHECKING, Iterable, List, Optional, Union
import pypdfium2 as pdfium
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import SegmentedPdfPage, TextCell
from docling_parse.pdf_parser import DoclingPdfParser, PdfDocument
from PIL import Image, ImageDraw
from pypdfium2 import PdfPage
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
from docling.datamodel.base_models import Size
from docling.utils.locks import pypdfium2_lock
if TYPE_CHECKING:
from docling.datamodel.document import InputDocument
_log = logging.getLogger(__name__)
class DoclingParseV4PageBackend(PdfPageBackend):
def __init__(self, parsed_page: SegmentedPdfPage, page_obj: PdfPage):
self._ppage = page_obj
self._dpage = parsed_page
self.valid = parsed_page is not None
def is_valid(self) -> bool:
return self.valid
def get_text_in_rect(self, bbox: BoundingBox) -> str:
# Find intersecting cells on the page
text_piece = ""
page_size = self.get_size()
scale = (
1 # FIX - Replace with param in get_text_in_rect across backends (optional)
)
for i, cell in enumerate(self._dpage.textline_cells):
cell_bbox = (
cell.rect.to_bounding_box()
.to_top_left_origin(page_height=page_size.height)
.scaled(scale)
)
overlap_frac = cell_bbox.intersection_area_with(bbox) / cell_bbox.area()
if overlap_frac > 0.5:
if len(text_piece) > 0:
text_piece += " "
text_piece += cell.text
return text_piece
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
return self._dpage
def get_text_cells(self) -> Iterable[TextCell]:
page_size = self.get_size()
[tc.to_top_left_origin(page_size.height) for tc in self._dpage.textline_cells]
# for cell in self._dpage.textline_cells:
# rect = cell.rect
#
# assert (
# rect.to_bounding_box().l <= rect.to_bounding_box().r
# ), f"left is > right on bounding box {rect.to_bounding_box()} of rect {rect}"
# assert (
# rect.to_bounding_box().t <= rect.to_bounding_box().b
# ), f"top is > bottom on bounding box {rect.to_bounding_box()} of rect {rect}"
return self._dpage.textline_cells
def get_bitmap_rects(self, scale: float = 1) -> Iterable[BoundingBox]:
AREA_THRESHOLD = 0 # 32 * 32
images = self._dpage.bitmap_resources
for img in images:
cropbox = img.rect.to_bounding_box().to_top_left_origin(
self.get_size().height
)
if cropbox.area() > AREA_THRESHOLD:
cropbox = cropbox.scaled(scale=scale)
yield cropbox
def get_page_image(
self, scale: float = 1, cropbox: Optional[BoundingBox] = None
) -> Image.Image:
page_size = self.get_size()
if not cropbox:
cropbox = BoundingBox(
l=0,
r=page_size.width,
t=0,
b=page_size.height,
coord_origin=CoordOrigin.TOPLEFT,
)
padbox = BoundingBox(
l=0, r=0, t=0, b=0, coord_origin=CoordOrigin.BOTTOMLEFT
)
else:
padbox = cropbox.to_bottom_left_origin(page_size.height).model_copy()
padbox.r = page_size.width - padbox.r
padbox.t = page_size.height - padbox.t
with pypdfium2_lock:
image = (
self._ppage.render(
scale=scale * 1.5,
rotation=0, # no additional rotation
crop=padbox.as_tuple(),
)
.to_pil()
.resize(
size=(round(cropbox.width * scale), round(cropbox.height * scale))
)
) # We resize the image from 1.5x the given scale to make it sharper.
return image
def get_size(self) -> Size:
with pypdfium2_lock:
return Size(width=self._ppage.get_width(), height=self._ppage.get_height())
# TODO: Take width and height from docling-parse.
# return Size(
# width=self._dpage.dimension.width,
# height=self._dpage.dimension.height,
# )
def unload(self):
self._ppage = None
self._dpage = None
class DoclingParseV4DocumentBackend(PdfDocumentBackend):
def __init__(self, in_doc: "InputDocument", path_or_stream: Union[BytesIO, Path]):
super().__init__(in_doc, path_or_stream)
with pypdfium2_lock:
self._pdoc = pdfium.PdfDocument(self.path_or_stream)
self.parser = DoclingPdfParser(loglevel="fatal")
self.dp_doc: PdfDocument = self.parser.load(path_or_stream=self.path_or_stream)
success = self.dp_doc is not None
if not success:
raise RuntimeError(
f"docling-parse v4 could not load document {self.document_hash}."
)
def page_count(self) -> int:
# return len(self._pdoc) # To be replaced with docling-parse API
len_1 = len(self._pdoc)
len_2 = self.dp_doc.number_of_pages()
if len_1 != len_2:
_log.error(f"Inconsistent number of pages: {len_1}!={len_2}")
return len_2
def load_page(
self, page_no: int, create_words: bool = True, create_textlines: bool = True
) -> DoclingParseV4PageBackend:
with pypdfium2_lock:
return DoclingParseV4PageBackend(
self.dp_doc.get_page(
page_no + 1,
create_words=create_words,
create_textlines=create_textlines,
),
self._pdoc[page_no],
)
def is_valid(self) -> bool:
return self.page_count() > 0
def unload(self):
super().unload()
self.dp_doc.unload()
with pypdfium2_lock:
self._pdoc.close()
self._pdoc = None

View File

@ -275,8 +275,10 @@ class MsWordDocumentBackend(DeclarativeDocumentBackend):
only_equations.append(latex_equation)
texts_and_equations.append(latex_equation)
if "".join(only_texts) != text:
return text
if "".join(only_texts).strip() != text.strip():
# If we are not able to reconstruct the initial raw text
# do not try to parse equations and return the original
return text, []
return "".join(texts_and_equations), only_equations
@ -365,6 +367,7 @@ class MsWordDocumentBackend(DeclarativeDocumentBackend):
for eq in equations:
if len(text_tmp) == 0:
break
pre_eq_text = text_tmp.split(eq, maxsplit=1)[0]
text_tmp = text_tmp.split(eq, maxsplit=1)[1]
if len(pre_eq_text) > 0:

View File

@ -4,10 +4,11 @@ from pathlib import Path
from typing import Iterable, Optional, Set, Union
from docling_core.types.doc import BoundingBox, Size
from docling_core.types.doc.page import SegmentedPdfPage, TextCell
from PIL import Image
from docling.backend.abstract_backend import PaginatedDocumentBackend
from docling.datamodel.base_models import Cell, InputFormat
from docling.datamodel.base_models import InputFormat
from docling.datamodel.document import InputDocument
@ -17,7 +18,11 @@ class PdfPageBackend(ABC):
pass
@abstractmethod
def get_text_cells(self) -> Iterable[Cell]:
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
pass
@abstractmethod
def get_text_cells(self) -> Iterable[TextCell]:
pass
@abstractmethod

View File

@ -7,12 +7,12 @@ from typing import TYPE_CHECKING, Iterable, List, Optional, Union
import pypdfium2 as pdfium
import pypdfium2.raw as pdfium_c
from docling_core.types.doc import BoundingBox, CoordOrigin, Size
from docling_core.types.doc.page import BoundingRectangle, SegmentedPdfPage, TextCell
from PIL import Image, ImageDraw
from pypdfium2 import PdfTextPage
from pypdfium2._helpers.misc import PdfiumError
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
from docling.datamodel.base_models import Cell
from docling.utils.locks import pypdfium2_lock
if TYPE_CHECKING:
@ -68,7 +68,10 @@ class PyPdfiumPageBackend(PdfPageBackend):
return text_piece
def get_text_cells(self) -> Iterable[Cell]:
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
return None
def get_text_cells(self) -> Iterable[TextCell]:
with pypdfium2_lock:
if not self.text_page:
self.text_page = self._ppage.get_textpage()
@ -84,11 +87,19 @@ class PyPdfiumPageBackend(PdfPageBackend):
text_piece = self.text_page.get_text_bounded(*rect)
x0, y0, x1, y1 = rect
cells.append(
Cell(
id=cell_counter,
TextCell(
index=cell_counter,
text=text_piece,
bbox=BoundingBox(
l=x0, b=y0, r=x1, t=y1, coord_origin=CoordOrigin.BOTTOMLEFT
orig=text_piece,
from_ocr=False,
rect=BoundingRectangle.from_bounding_box(
BoundingBox(
l=x0,
b=y0,
r=x1,
t=y1,
coord_origin=CoordOrigin.BOTTOMLEFT,
)
).to_top_left_origin(page_size.height),
)
)
@ -97,51 +108,56 @@ class PyPdfiumPageBackend(PdfPageBackend):
# PyPdfium2 produces very fragmented cells, with sub-word level boundaries, in many PDFs.
# The cell merging code below is to clean this up.
def merge_horizontal_cells(
cells: List[Cell],
cells: List[TextCell],
horizontal_threshold_factor: float = 1.0,
vertical_threshold_factor: float = 0.5,
) -> List[Cell]:
) -> List[TextCell]:
if not cells:
return []
def group_rows(cells: List[Cell]) -> List[List[Cell]]:
def group_rows(cells: List[TextCell]) -> List[List[TextCell]]:
rows = []
current_row = [cells[0]]
row_top = cells[0].bbox.t
row_bottom = cells[0].bbox.b
row_height = cells[0].bbox.height
row_top = cells[0].rect.to_bounding_box().t
row_bottom = cells[0].rect.to_bounding_box().b
row_height = cells[0].rect.to_bounding_box().height
for cell in cells[1:]:
vertical_threshold = row_height * vertical_threshold_factor
if (
abs(cell.bbox.t - row_top) <= vertical_threshold
and abs(cell.bbox.b - row_bottom) <= vertical_threshold
abs(cell.rect.to_bounding_box().t - row_top)
<= vertical_threshold
and abs(cell.rect.to_bounding_box().b - row_bottom)
<= vertical_threshold
):
current_row.append(cell)
row_top = min(row_top, cell.bbox.t)
row_bottom = max(row_bottom, cell.bbox.b)
row_top = min(row_top, cell.rect.to_bounding_box().t)
row_bottom = max(row_bottom, cell.rect.to_bounding_box().b)
row_height = row_bottom - row_top
else:
rows.append(current_row)
current_row = [cell]
row_top = cell.bbox.t
row_bottom = cell.bbox.b
row_height = cell.bbox.height
row_top = cell.rect.to_bounding_box().t
row_bottom = cell.rect.to_bounding_box().b
row_height = cell.rect.to_bounding_box().height
if current_row:
rows.append(current_row)
return rows
def merge_row(row: List[Cell]) -> List[Cell]:
def merge_row(row: List[TextCell]) -> List[TextCell]:
merged = []
current_group = [row[0]]
for cell in row[1:]:
prev_cell = current_group[-1]
avg_height = (prev_cell.bbox.height + cell.bbox.height) / 2
avg_height = (
prev_cell.rect.height + cell.rect.to_bounding_box().height
) / 2
if (
cell.bbox.l - prev_cell.bbox.r
cell.rect.to_bounding_box().l
- prev_cell.rect.to_bounding_box().r
<= avg_height * horizontal_threshold_factor
):
current_group.append(cell)
@ -154,24 +170,30 @@ class PyPdfiumPageBackend(PdfPageBackend):
return merged
def merge_group(group: List[Cell]) -> Cell:
def merge_group(group: List[TextCell]) -> TextCell:
if len(group) == 1:
return group[0]
merged_text = "".join(cell.text for cell in group)
merged_bbox = BoundingBox(
l=min(cell.bbox.l for cell in group),
t=min(cell.bbox.t for cell in group),
r=max(cell.bbox.r for cell in group),
b=max(cell.bbox.b for cell in group),
l=min(cell.rect.to_bounding_box().l for cell in group),
t=min(cell.rect.to_bounding_box().t for cell in group),
r=max(cell.rect.to_bounding_box().r for cell in group),
b=max(cell.rect.to_bounding_box().b for cell in group),
)
return TextCell(
index=group[0].index,
text=merged_text,
orig=merged_text,
rect=BoundingRectangle.from_bounding_box(merged_bbox),
from_ocr=False,
)
return Cell(id=group[0].id, text=merged_text, bbox=merged_bbox)
rows = group_rows(cells)
merged_cells = [cell for row in rows for cell in merge_row(row)]
for i, cell in enumerate(merged_cells, 1):
cell.id = i
cell.index = i
return merged_cells
@ -181,7 +203,7 @@ class PyPdfiumPageBackend(PdfPageBackend):
) # make new image to avoid drawing on the saved ones
draw = ImageDraw.Draw(image)
for c in cells:
x0, y0, x1, y1 = c.bbox.as_tuple()
x0, y0, x1, y1 = c.rect.to_bounding_box().as_tuple()
cell_color = (
random.randint(30, 140),
random.randint(30, 140),

View File

@ -9,6 +9,7 @@ import warnings
from pathlib import Path
from typing import Annotated, Dict, Iterable, List, Optional, Type
import rich.table
import typer
from docling_core.types.doc import ImageRefMode
from docling_core.utils.file import resolve_source_to_path
@ -16,6 +17,7 @@ from pydantic import TypeAdapter
from docling.backend.docling_parse_backend import DoclingParseDocumentBackend
from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend
from docling.backend.docling_parse_v4_backend import DoclingParseV4DocumentBackend
from docling.backend.pdf_backend import PdfDocumentBackend
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
from docling.datamodel.base_models import (
@ -29,18 +31,14 @@ from docling.datamodel.pipeline_options import (
AcceleratorDevice,
AcceleratorOptions,
EasyOcrOptions,
OcrEngine,
OcrMacOptions,
OcrOptions,
PdfBackend,
PdfPipelineOptions,
RapidOcrOptions,
TableFormerMode,
TesseractCliOcrOptions,
TesseractOcrOptions,
)
from docling.datamodel.settings import settings
from docling.document_converter import DocumentConverter, FormatOption, PdfFormatOption
from docling.models.factories import get_ocr_factory
warnings.filterwarnings(action="ignore", category=UserWarning, module="pydantic|torch")
warnings.filterwarnings(action="ignore", category=FutureWarning, module="easyocr")
@ -48,8 +46,11 @@ warnings.filterwarnings(action="ignore", category=FutureWarning, module="easyocr
_log = logging.getLogger(__name__)
from rich.console import Console
console = Console()
err_console = Console(stderr=True)
ocr_factory_internal = get_ocr_factory(allow_external_plugins=False)
ocr_engines_enum_internal = ocr_factory_internal.get_enum()
app = typer.Typer(
name="Docling",
@ -77,6 +78,24 @@ def version_callback(value: bool):
raise typer.Exit()
def show_external_plugins_callback(value: bool):
if value:
ocr_factory_all = get_ocr_factory(allow_external_plugins=True)
table = rich.table.Table(title="Available OCR engines")
table.add_column("Name", justify="right")
table.add_column("Plugin")
table.add_column("Package")
for meta in ocr_factory_all.registered_meta.values():
if not meta.module.startswith("docling."):
table.add_row(
f"[bold]{meta.kind}[/bold]",
meta.plugin_name,
meta.module.split(".")[0],
)
rich.print(table)
raise typer.Exit()
def export_documents(
conv_results: Iterable[ConversionResult],
output_dir: Path,
@ -195,8 +214,16 @@ def convert(
),
] = False,
ocr_engine: Annotated[
OcrEngine, typer.Option(..., help="The OCR engine to use.")
] = OcrEngine.EASYOCR,
str,
typer.Option(
...,
help=(
f"The OCR engine to use. When --allow-external-plugins is *not* set, the available values are: "
f"{', '.join((o.value for o in ocr_engines_enum_internal))}. "
f"Use the option --show-external-plugins to see the options allowed with external plugins."
),
),
] = EasyOcrOptions.kind,
ocr_lang: Annotated[
Optional[str],
typer.Option(
@ -240,6 +267,21 @@ def convert(
..., help="Must be enabled when using models connecting to remote services."
),
] = False,
allow_external_plugins: Annotated[
bool,
typer.Option(
..., help="Must be enabled for loading modules from third-party plugins."
),
] = False,
show_external_plugins: Annotated[
bool,
typer.Option(
...,
help="List the third-party plugins which are available when the option --allow-external-plugins is set.",
callback=show_external_plugins_callback,
is_eager=True,
),
] = False,
abort_on_error: Annotated[
bool,
typer.Option(
@ -367,18 +409,11 @@ def convert(
export_txt = OutputFormat.TEXT in to_formats
export_doctags = OutputFormat.DOCTAGS in to_formats
if ocr_engine == OcrEngine.EASYOCR:
ocr_options: OcrOptions = EasyOcrOptions(force_full_page_ocr=force_ocr)
elif ocr_engine == OcrEngine.TESSERACT_CLI:
ocr_options = TesseractCliOcrOptions(force_full_page_ocr=force_ocr)
elif ocr_engine == OcrEngine.TESSERACT:
ocr_options = TesseractOcrOptions(force_full_page_ocr=force_ocr)
elif ocr_engine == OcrEngine.OCRMAC:
ocr_options = OcrMacOptions(force_full_page_ocr=force_ocr)
elif ocr_engine == OcrEngine.RAPIDOCR:
ocr_options = RapidOcrOptions(force_full_page_ocr=force_ocr)
else:
raise RuntimeError(f"Unexpected OCR engine type {ocr_engine}")
ocr_factory = get_ocr_factory(allow_external_plugins=allow_external_plugins)
ocr_options: OcrOptions = ocr_factory.create_options( # type: ignore
kind=ocr_engine,
force_full_page_ocr=force_ocr,
)
ocr_lang_list = _split_list(ocr_lang)
if ocr_lang_list is not None:
@ -386,6 +421,7 @@ def convert(
accelerator_options = AcceleratorOptions(num_threads=num_threads, device=device)
pipeline_options = PdfPipelineOptions(
allow_external_plugins=allow_external_plugins,
enable_remote_services=enable_remote_services,
accelerator_options=accelerator_options,
do_ocr=ocr,
@ -412,12 +448,15 @@ def convert(
if artifacts_path is not None:
pipeline_options.artifacts_path = artifacts_path
backend: Type[PdfDocumentBackend]
if pdf_backend == PdfBackend.DLPARSE_V1:
backend: Type[PdfDocumentBackend] = DoclingParseDocumentBackend
backend = DoclingParseDocumentBackend
elif pdf_backend == PdfBackend.DLPARSE_V2:
backend = DoclingParseV2DocumentBackend
elif pdf_backend == PdfBackend.DLPARSE_V4:
backend = DoclingParseV4DocumentBackend # type: ignore
elif pdf_backend == PdfBackend.PYPDFIUM2:
backend = PyPdfiumDocumentBackend
backend = PyPdfiumDocumentBackend # type: ignore
else:
raise RuntimeError(f"Unexpected PDF backend type {pdf_backend}")

View File

@ -9,6 +9,7 @@ from docling_core.types.doc import (
Size,
TableCell,
)
from docling_core.types.doc.page import SegmentedPdfPage, TextCell
from docling_core.types.io import ( # DO ΝΟΤ REMOVE; explicitly exposed from this location
DocumentStream,
)
@ -123,14 +124,10 @@ class ErrorItem(BaseModel):
error_message: str
class Cell(BaseModel):
id: int
text: str
bbox: BoundingBox
class OcrCell(Cell):
confidence: float
# class Cell(BaseModel):
# id: int
# text: str
# bbox: BoundingBox
class Cluster(BaseModel):
@ -138,7 +135,7 @@ class Cluster(BaseModel):
label: DocItemLabel
bbox: BoundingBox
confidence: float = 1.0
cells: List[Cell] = []
cells: List[TextCell] = []
children: List["Cluster"] = [] # Add child cluster support
@ -226,7 +223,8 @@ class Page(BaseModel):
page_no: int
# page_hash: Optional[str] = None
size: Optional[Size] = None
cells: List[Cell] = []
cells: List[TextCell] = []
parsed_page: Optional[SegmentedPdfPage] = None
predictions: PagePredictions = PagePredictions()
assembled: Optional[AssembledUnit] = None

View File

@ -1,10 +1,9 @@
import logging
import os
import re
import warnings
from enum import Enum
from pathlib import Path
from typing import Annotated, Any, Dict, List, Literal, Optional, Union
from typing import Any, ClassVar, Dict, List, Literal, Optional, Union
from pydantic import (
AnyUrl,
@ -13,13 +12,8 @@ from pydantic import (
Field,
field_validator,
model_validator,
validator,
)
from pydantic_settings import (
BaseSettings,
PydanticBaseSettingsSource,
SettingsConfigDict,
)
from pydantic_settings import BaseSettings, SettingsConfigDict
from typing_extensions import deprecated
_log = logging.getLogger(__name__)
@ -83,6 +77,12 @@ class AcceleratorOptions(BaseSettings):
return data
class BaseOptions(BaseModel):
"""Base class for options."""
kind: ClassVar[str]
class TableFormerMode(str, Enum):
"""Modes for the TableFormer model."""
@ -102,10 +102,9 @@ class TableStructureOptions(BaseModel):
mode: TableFormerMode = TableFormerMode.ACCURATE
class OcrOptions(BaseModel):
class OcrOptions(BaseOptions):
"""OCR options."""
kind: str
lang: List[str]
force_full_page_ocr: bool = False # If enabled a full page OCR is always applied
bitmap_area_threshold: float = (
@ -116,7 +115,7 @@ class OcrOptions(BaseModel):
class RapidOcrOptions(OcrOptions):
"""Options for the RapidOCR engine."""
kind: Literal["rapidocr"] = "rapidocr"
kind: ClassVar[Literal["rapidocr"]] = "rapidocr"
# English and chinese are the most commly used models and have been tested with RapidOCR.
lang: List[str] = [
@ -155,7 +154,7 @@ class RapidOcrOptions(OcrOptions):
class EasyOcrOptions(OcrOptions):
"""Options for the EasyOCR engine."""
kind: Literal["easyocr"] = "easyocr"
kind: ClassVar[Literal["easyocr"]] = "easyocr"
lang: List[str] = ["fr", "de", "es", "en"]
use_gpu: Optional[bool] = None
@ -175,7 +174,7 @@ class EasyOcrOptions(OcrOptions):
class TesseractCliOcrOptions(OcrOptions):
"""Options for the TesseractCli engine."""
kind: Literal["tesseract"] = "tesseract"
kind: ClassVar[Literal["tesseract"]] = "tesseract"
lang: List[str] = ["fra", "deu", "spa", "eng"]
tesseract_cmd: str = "tesseract"
path: Optional[str] = None
@ -188,7 +187,7 @@ class TesseractCliOcrOptions(OcrOptions):
class TesseractOcrOptions(OcrOptions):
"""Options for the Tesseract engine."""
kind: Literal["tesserocr"] = "tesserocr"
kind: ClassVar[Literal["tesserocr"]] = "tesserocr"
lang: List[str] = ["fra", "deu", "spa", "eng"]
path: Optional[str] = None
@ -200,7 +199,7 @@ class TesseractOcrOptions(OcrOptions):
class OcrMacOptions(OcrOptions):
"""Options for the Mac OCR engine."""
kind: Literal["ocrmac"] = "ocrmac"
kind: ClassVar[Literal["ocrmac"]] = "ocrmac"
lang: List[str] = ["fr-FR", "de-DE", "es-ES", "en-US"]
recognition: str = "accurate"
framework: str = "vision"
@ -210,8 +209,7 @@ class OcrMacOptions(OcrOptions):
)
class PictureDescriptionBaseOptions(BaseModel):
kind: str
class PictureDescriptionBaseOptions(BaseOptions):
batch_size: int = 8
scale: float = 2
@ -221,7 +219,7 @@ class PictureDescriptionBaseOptions(BaseModel):
class PictureDescriptionApiOptions(PictureDescriptionBaseOptions):
kind: Literal["api"] = "api"
kind: ClassVar[Literal["api"]] = "api"
url: AnyUrl = AnyUrl("http://localhost:8000/v1/chat/completions")
headers: Dict[str, str] = {}
@ -233,7 +231,7 @@ class PictureDescriptionApiOptions(PictureDescriptionBaseOptions):
class PictureDescriptionVlmOptions(PictureDescriptionBaseOptions):
kind: Literal["vlm"] = "vlm"
kind: ClassVar[Literal["vlm"]] = "vlm"
repo_id: str
prompt: str = "Describe this image in a few sentences."
@ -301,9 +299,11 @@ class PdfBackend(str, Enum):
PYPDFIUM2 = "pypdfium2"
DLPARSE_V1 = "dlparse_v1"
DLPARSE_V2 = "dlparse_v2"
DLPARSE_V4 = "dlparse_v4"
# Define an enum for the ocr engines
@deprecated("Use ocr_factory.registered_enum")
class OcrEngine(str, Enum):
"""Enum of valid OCR engines."""
@ -323,6 +323,7 @@ class PipelineOptions(BaseModel):
document_timeout: Optional[float] = None
accelerator_options: AcceleratorOptions = AcceleratorOptions()
enable_remote_services: bool = False
allow_external_plugins: bool = False
class PaginatedPipelineOptions(PipelineOptions):
@ -358,17 +359,10 @@ class PdfPipelineOptions(PaginatedPipelineOptions):
# If True, text from backend will be used instead of generated text
table_structure_options: TableStructureOptions = TableStructureOptions()
ocr_options: Union[
EasyOcrOptions,
TesseractCliOcrOptions,
TesseractOcrOptions,
OcrMacOptions,
RapidOcrOptions,
] = Field(EasyOcrOptions(), discriminator="kind")
picture_description_options: Annotated[
Union[PictureDescriptionApiOptions, PictureDescriptionVlmOptions],
Field(discriminator="kind"),
] = smolvlm_picture_description
ocr_options: OcrOptions = EasyOcrOptions()
picture_description_options: PictureDescriptionBaseOptions = (
smolvlm_picture_description
)
images_scale: float = 1.0
generate_page_images: bool = False
@ -381,3 +375,5 @@ class PdfPipelineOptions(PaginatedPipelineOptions):
"before conversion and then use the `TableItem.get_image` function."
),
)
generate_parsed_pages: bool = False

View File

@ -11,7 +11,7 @@ from pydantic import BaseModel, ConfigDict, model_validator, validate_call
from docling.backend.abstract_backend import AbstractDocumentBackend
from docling.backend.asciidoc_backend import AsciiDocBackend
from docling.backend.csv_backend import CsvDocumentBackend
from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend
from docling.backend.docling_parse_v4_backend import DoclingParseV4DocumentBackend
from docling.backend.html_backend import HTMLDocumentBackend
from docling.backend.json.docling_json_backend import DoclingJSONBackend
from docling.backend.md_backend import MarkdownDocumentBackend
@ -109,12 +109,12 @@ class XMLJatsFormatOption(FormatOption):
class ImageFormatOption(FormatOption):
pipeline_cls: Type = StandardPdfPipeline
backend: Type[AbstractDocumentBackend] = DoclingParseV2DocumentBackend
backend: Type[AbstractDocumentBackend] = DoclingParseV4DocumentBackend
class PdfFormatOption(FormatOption):
pipeline_cls: Type = StandardPdfPipeline
backend: Type[AbstractDocumentBackend] = DoclingParseV2DocumentBackend
backend: Type[AbstractDocumentBackend] = DoclingParseV4DocumentBackend
def _get_default_option(format: InputFormat) -> FormatOption:
@ -147,10 +147,10 @@ def _get_default_option(format: InputFormat) -> FormatOption:
pipeline_cls=SimplePipeline, backend=JatsDocumentBackend
),
InputFormat.IMAGE: FormatOption(
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV2DocumentBackend
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV4DocumentBackend
),
InputFormat.PDF: FormatOption(
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV2DocumentBackend
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV4DocumentBackend
),
InputFormat.JSON_DOCLING: FormatOption(
pipeline_cls=SimplePipeline, backend=DoclingJSONBackend

View File

@ -1,14 +1,22 @@
from abc import ABC, abstractmethod
from typing import Any, Generic, Iterable, Optional
from typing import Any, Generic, Iterable, Optional, Protocol, Type
from docling_core.types.doc import BoundingBox, DocItem, DoclingDocument, NodeItem
from typing_extensions import TypeVar
from docling.datamodel.base_models import ItemAndImageEnrichmentElement, Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import BaseOptions
from docling.datamodel.settings import settings
class BaseModelWithOptions(Protocol):
@classmethod
def get_options_type(cls) -> Type[BaseOptions]: ...
def __init__(self, *, options: BaseOptions, **kwargs): ...
class BasePageModel(ABC):
@abstractmethod
def __call__(

View File

@ -2,25 +2,33 @@ import copy
import logging
from abc import abstractmethod
from pathlib import Path
from typing import Iterable, List
from typing import Iterable, List, Optional, Type
import numpy as np
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, PdfTextCell, TextCell
from PIL import Image, ImageDraw
from rtree import index
from scipy.ndimage import binary_dilation, find_objects, label
from docling.datamodel.base_models import Cell, OcrCell, Page
from docling.datamodel.base_models import Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import OcrOptions
from docling.datamodel.pipeline_options import AcceleratorOptions, OcrOptions
from docling.datamodel.settings import settings
from docling.models.base_model import BasePageModel
from docling.models.base_model import BaseModelWithOptions, BasePageModel
_log = logging.getLogger(__name__)
class BaseOcrModel(BasePageModel):
def __init__(self, enabled: bool, options: OcrOptions):
class BaseOcrModel(BasePageModel, BaseModelWithOptions):
def __init__(
self,
*,
enabled: bool,
artifacts_path: Optional[Path],
options: OcrOptions,
accelerator_options: AcceleratorOptions,
):
self.enabled = enabled
self.options = options
@ -104,11 +112,13 @@ class BaseOcrModel(BasePageModel):
p.dimension = 2
idx = index.Index(properties=p)
for i, cell in enumerate(programmatic_cells):
idx.insert(i, cell.bbox.as_tuple())
idx.insert(i, cell.rect.to_bounding_box().as_tuple())
def is_overlapping_with_existing_cells(ocr_cell):
# Query the R-tree to get overlapping rectangles
possible_matches_index = list(idx.intersection(ocr_cell.bbox.as_tuple()))
possible_matches_index = list(
idx.intersection(ocr_cell.rect.to_bounding_box().as_tuple())
)
return (
len(possible_matches_index) > 0
@ -125,10 +135,7 @@ class BaseOcrModel(BasePageModel):
"""
if self.options.force_full_page_ocr:
# If a full page OCR is forced, use only the OCR cells
cells = [
Cell(id=c_ocr.id, text=c_ocr.text, bbox=c_ocr.bbox)
for c_ocr in ocr_cells
]
cells = ocr_cells
return cells
## Remove OCR cells which overlap with programmatic cells.
@ -156,7 +163,7 @@ class BaseOcrModel(BasePageModel):
# Draw OCR and programmatic cells
for tc in page.cells:
x0, y0, x1, y1 = tc.bbox.as_tuple()
x0, y0, x1, y1 = tc.rect.to_bounding_box().as_tuple()
y0 *= scale_x
y1 *= scale_y
x0 *= scale_x
@ -165,9 +172,8 @@ class BaseOcrModel(BasePageModel):
if y1 <= y0:
y1, y0 = y0, y1
color = "gray"
if isinstance(tc, OcrCell):
color = "magenta"
color = "magenta" if tc.from_ocr else "gray"
draw.rectangle([(x0, y0), (x1, y1)], outline=color)
if show:
@ -187,3 +193,8 @@ class BaseOcrModel(BasePageModel):
self, conv_res: ConversionResult, page_batch: Iterable[Page]
) -> Iterable[Page]:
pass
@classmethod
@abstractmethod
def get_options_type(cls) -> Type[OcrOptions]:
pass

View File

@ -2,17 +2,19 @@ import logging
import warnings
import zipfile
from pathlib import Path
from typing import Iterable, List, Optional
from typing import Iterable, List, Optional, Type
import numpy
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, TextCell
from docling.datamodel.base_models import Cell, OcrCell, Page
from docling.datamodel.base_models import Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import (
AcceleratorDevice,
AcceleratorOptions,
EasyOcrOptions,
OcrOptions,
)
from docling.datamodel.settings import settings
from docling.models.base_ocr_model import BaseOcrModel
@ -33,7 +35,12 @@ class EasyOcrModel(BaseOcrModel):
options: EasyOcrOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(enabled=enabled, options=options)
super().__init__(
enabled=enabled,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: EasyOcrOptions
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
@ -148,11 +155,14 @@ class EasyOcrModel(BaseOcrModel):
del im
cells = [
OcrCell(
id=ix,
TextCell(
index=ix,
text=line[1],
orig=line[1],
from_ocr=True,
confidence=line[2],
bbox=BoundingBox.from_tuple(
rect=BoundingRectangle.from_bounding_box(
BoundingBox.from_tuple(
coord=(
(line[0][0][0] / self.scale) + ocr_rect.l,
(line[0][0][1] / self.scale) + ocr_rect.t,
@ -160,6 +170,7 @@ class EasyOcrModel(BaseOcrModel):
(line[0][2][1] / self.scale) + ocr_rect.t,
),
origin=CoordOrigin.TOPLEFT,
)
),
)
for ix, line in enumerate(result)
@ -175,3 +186,7 @@ class EasyOcrModel(BaseOcrModel):
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
yield page
@classmethod
def get_options_type(cls) -> Type[OcrOptions]:
return EasyOcrOptions

View File

@ -0,0 +1,27 @@
import logging
from functools import lru_cache
from docling.models.factories.ocr_factory import OcrFactory
from docling.models.factories.picture_description_factory import (
PictureDescriptionFactory,
)
logger = logging.getLogger(__name__)
@lru_cache()
def get_ocr_factory(allow_external_plugins: bool = False) -> OcrFactory:
factory = OcrFactory()
factory.load_from_plugins(allow_external_plugins=allow_external_plugins)
logger.info("Registered ocr engines: %r", factory.registered_kind)
return factory
@lru_cache()
def get_picture_description_factory(
allow_external_plugins: bool = False,
) -> PictureDescriptionFactory:
factory = PictureDescriptionFactory()
factory.load_from_plugins(allow_external_plugins=allow_external_plugins)
logger.info("Registered picture descriptions: %r", factory.registered_kind)
return factory

View File

@ -0,0 +1,122 @@
import enum
import logging
from abc import ABCMeta
from typing import Generic, Optional, Type, TypeVar
from pluggy import PluginManager
from pydantic import BaseModel
from docling.datamodel.pipeline_options import BaseOptions
from docling.models.base_model import BaseModelWithOptions
A = TypeVar("A", bound=BaseModelWithOptions)
logger = logging.getLogger(__name__)
class FactoryMeta(BaseModel):
kind: str
plugin_name: str
module: str
class BaseFactory(Generic[A], metaclass=ABCMeta):
default_plugin_name = "docling"
def __init__(self, plugin_attr_name: str, plugin_name=default_plugin_name):
self.plugin_name = plugin_name
self.plugin_attr_name = plugin_attr_name
self._classes: dict[Type[BaseOptions], Type[A]] = {}
self._meta: dict[Type[BaseOptions], FactoryMeta] = {}
@property
def registered_kind(self) -> list[str]:
return list(opt.kind for opt in self._classes.keys())
def get_enum(self) -> enum.Enum:
return enum.Enum(
self.plugin_attr_name + "_enum",
names={kind: kind for kind in self.registered_kind},
type=str,
module=__name__,
)
@property
def classes(self):
return self._classes
@property
def registered_meta(self):
return self._meta
def create_instance(self, options: BaseOptions, **kwargs) -> A:
try:
_cls = self._classes[type(options)]
return _cls(options=options, **kwargs)
except KeyError:
raise RuntimeError(self._err_msg_on_class_not_found(options.kind))
def create_options(self, kind: str, *args, **kwargs) -> BaseOptions:
for opt_cls, _ in self._classes.items():
if opt_cls.kind == kind:
return opt_cls(*args, **kwargs)
raise RuntimeError(self._err_msg_on_class_not_found(kind))
def _err_msg_on_class_not_found(self, kind: str):
msg = []
for opt, cls in self._classes.items():
msg.append(f"\t{opt.kind!r} => {cls!r}")
msg_str = "\n".join(msg)
return f"No class found with the name {kind!r}, known classes are:\n{msg_str}"
def register(self, cls: Type[A], plugin_name: str, plugin_module_name: str):
opt_type = cls.get_options_type()
if opt_type in self._classes:
raise ValueError(
f"{opt_type.kind!r} already registered to class {self._classes[opt_type]!r}"
)
self._classes[opt_type] = cls
self._meta[opt_type] = FactoryMeta(
kind=opt_type.kind, plugin_name=plugin_name, module=plugin_module_name
)
def load_from_plugins(
self, plugin_name: Optional[str] = None, allow_external_plugins: bool = False
):
plugin_name = plugin_name or self.plugin_name
plugin_manager = PluginManager(plugin_name)
plugin_manager.load_setuptools_entrypoints(plugin_name)
for plugin_name, plugin_module in plugin_manager.list_name_plugin():
plugin_module_name = str(plugin_module.__name__) # type: ignore
if not allow_external_plugins and not plugin_module_name.startswith(
"docling."
):
logger.warning(
f"The plugin {plugin_name} will not be loaded because Docling is being executed with allow_external_plugins=false."
)
continue
attr = getattr(plugin_module, self.plugin_attr_name, None)
if callable(attr):
logger.info("Loading plugin %r", plugin_name)
config = attr()
self.process_plugin(config, plugin_name, plugin_module_name)
def process_plugin(self, config, plugin_name: str, plugin_module_name: str):
for item in config[self.plugin_attr_name]:
try:
self.register(item, plugin_name, plugin_module_name)
except ValueError:
logger.warning("%r already registered", item)

View File

@ -0,0 +1,11 @@
import logging
from docling.models.base_ocr_model import BaseOcrModel
from docling.models.factories.base_factory import BaseFactory
logger = logging.getLogger(__name__)
class OcrFactory(BaseFactory[BaseOcrModel]):
def __init__(self, *args, **kwargs):
super().__init__("ocr_engines", *args, **kwargs)

View File

@ -0,0 +1,11 @@
import logging
from docling.models.factories.base_factory import BaseFactory
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
logger = logging.getLogger(__name__)
class PictureDescriptionFactory(BaseFactory[PictureDescriptionBaseModel]):
def __init__(self, *args, **kwargs):
super().__init__("picture_description", *args, **kwargs)

View File

@ -1,12 +1,19 @@
import logging
import sys
import tempfile
from typing import Iterable, Optional, Tuple
from pathlib import Path
from typing import Iterable, Optional, Tuple, Type
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, TextCell
from docling.datamodel.base_models import OcrCell, Page
from docling.datamodel.base_models import Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import OcrMacOptions
from docling.datamodel.pipeline_options import (
AcceleratorOptions,
OcrMacOptions,
OcrOptions,
)
from docling.datamodel.settings import settings
from docling.models.base_ocr_model import BaseOcrModel
from docling.utils.profiling import TimeRecorder
@ -15,13 +22,26 @@ _log = logging.getLogger(__name__)
class OcrMacModel(BaseOcrModel):
def __init__(self, enabled: bool, options: OcrMacOptions):
super().__init__(enabled=enabled, options=options)
def __init__(
self,
enabled: bool,
artifacts_path: Optional[Path],
options: OcrMacOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(
enabled=enabled,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: OcrMacOptions
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
if self.enabled:
if "darwin" != sys.platform:
raise RuntimeError(f"OcrMac is only supported on Mac.")
install_errmsg = (
"ocrmac is not correctly installed. "
"Please install it via `pip install ocrmac` to use this OCR engine. "
@ -94,13 +114,17 @@ class OcrMacModel(BaseOcrModel):
bottom = y2 / self.scale
cells.append(
OcrCell(
id=ix,
TextCell(
index=ix,
text=text,
orig=text,
from_ocr=True,
confidence=confidence,
bbox=BoundingBox.from_tuple(
rect=BoundingRectangle.from_bounding_box(
BoundingBox.from_tuple(
coord=(left, top, right, bottom),
origin=CoordOrigin.TOPLEFT,
)
),
)
)
@ -116,3 +140,7 @@ class OcrMacModel(BaseOcrModel):
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
yield page
@classmethod
def get_options_type(cls) -> Type[OcrOptions]:
return OcrMacOptions

View File

@ -13,6 +13,7 @@ from docling.utils.profiling import TimeRecorder
class PagePreprocessingOptions(BaseModel):
images_scale: Optional[float]
create_parsed_page: bool
class PagePreprocessingModel(BasePageModel):
@ -55,6 +56,9 @@ class PagePreprocessingModel(BasePageModel):
page.cells = list(page._backend.get_text_cells())
if self.options.create_parsed_page:
page.parsed_page = page._backend.get_segmented_page()
# DEBUG code:
def draw_text_boxes(image, cells, show: bool = False):
draw = ImageDraw.Draw(image)

View File

@ -1,13 +1,18 @@
import base64
import io
import logging
from typing import Iterable, List, Optional
from pathlib import Path
from typing import Iterable, List, Optional, Type, Union
import requests
from PIL import Image
from pydantic import BaseModel, ConfigDict
from docling.datamodel.pipeline_options import PictureDescriptionApiOptions
from docling.datamodel.pipeline_options import (
AcceleratorOptions,
PictureDescriptionApiOptions,
PictureDescriptionBaseOptions,
)
from docling.exceptions import OperationNotAllowed
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
@ -46,13 +51,25 @@ class ApiResponse(BaseModel):
class PictureDescriptionApiModel(PictureDescriptionBaseModel):
# elements_batch_size = 4
@classmethod
def get_options_type(cls) -> Type[PictureDescriptionBaseOptions]:
return PictureDescriptionApiOptions
def __init__(
self,
enabled: bool,
enable_remote_services: bool,
artifacts_path: Optional[Union[Path, str]],
options: PictureDescriptionApiOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(enabled=enabled, options=options)
super().__init__(
enabled=enabled,
enable_remote_services=enable_remote_services,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: PictureDescriptionApiOptions
if self.enabled:

View File

@ -1,6 +1,7 @@
import logging
from abc import abstractmethod
from pathlib import Path
from typing import Any, Iterable, List, Optional, Union
from typing import Any, Iterable, List, Optional, Type, Union
from docling_core.types.doc import (
DoclingDocument,
@ -13,20 +14,30 @@ from docling_core.types.doc.document import ( # TODO: move import to docling_co
)
from PIL import Image
from docling.datamodel.pipeline_options import PictureDescriptionBaseOptions
from docling.datamodel.pipeline_options import (
AcceleratorOptions,
PictureDescriptionBaseOptions,
)
from docling.models.base_model import (
BaseItemAndImageEnrichmentModel,
BaseModelWithOptions,
ItemAndImageEnrichmentElement,
)
class PictureDescriptionBaseModel(BaseItemAndImageEnrichmentModel):
class PictureDescriptionBaseModel(
BaseItemAndImageEnrichmentModel, BaseModelWithOptions
):
images_scale: float = 2.0
def __init__(
self,
*,
enabled: bool,
enable_remote_services: bool,
artifacts_path: Optional[Union[Path, str]],
options: PictureDescriptionBaseOptions,
accelerator_options: AcceleratorOptions,
):
self.enabled = enabled
self.options = options
@ -62,3 +73,8 @@ class PictureDescriptionBaseModel(BaseItemAndImageEnrichmentModel):
PictureDescriptionData(text=output, provenance=self.provenance)
)
yield item
@classmethod
@abstractmethod
def get_options_type(cls) -> Type[PictureDescriptionBaseOptions]:
pass

View File

@ -1,10 +1,11 @@
from pathlib import Path
from typing import Iterable, Optional, Union
from typing import Iterable, Optional, Type, Union
from PIL import Image
from docling.datamodel.pipeline_options import (
AcceleratorOptions,
PictureDescriptionBaseOptions,
PictureDescriptionVlmOptions,
)
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
@ -13,14 +14,25 @@ from docling.utils.accelerator_utils import decide_device
class PictureDescriptionVlmModel(PictureDescriptionBaseModel):
@classmethod
def get_options_type(cls) -> Type[PictureDescriptionBaseOptions]:
return PictureDescriptionVlmOptions
def __init__(
self,
enabled: bool,
enable_remote_services: bool,
artifacts_path: Optional[Union[Path, str]],
options: PictureDescriptionVlmOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(enabled=enabled, options=options)
super().__init__(
enabled=enabled,
enable_remote_services=enable_remote_services,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: PictureDescriptionVlmOptions
if self.enabled:

View File

View File

@ -0,0 +1,28 @@
from docling.models.easyocr_model import EasyOcrModel
from docling.models.ocr_mac_model import OcrMacModel
from docling.models.picture_description_api_model import PictureDescriptionApiModel
from docling.models.picture_description_vlm_model import PictureDescriptionVlmModel
from docling.models.rapid_ocr_model import RapidOcrModel
from docling.models.tesseract_ocr_cli_model import TesseractOcrCliModel
from docling.models.tesseract_ocr_model import TesseractOcrModel
def ocr_engines():
return {
"ocr_engines": [
EasyOcrModel,
OcrMacModel,
RapidOcrModel,
TesseractOcrModel,
TesseractOcrCliModel,
]
}
def picture_description():
return {
"picture_description": [
PictureDescriptionVlmModel,
PictureDescriptionApiModel,
]
}

View File

@ -1,14 +1,17 @@
import logging
from typing import Iterable
from pathlib import Path
from typing import Iterable, Optional, Type
import numpy
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, TextCell
from docling.datamodel.base_models import OcrCell, Page
from docling.datamodel.base_models import Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import (
AcceleratorDevice,
AcceleratorOptions,
OcrOptions,
RapidOcrOptions,
)
from docling.datamodel.settings import settings
@ -23,10 +26,16 @@ class RapidOcrModel(BaseOcrModel):
def __init__(
self,
enabled: bool,
artifacts_path: Optional[Path],
options: RapidOcrOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(enabled=enabled, options=options)
super().__init__(
enabled=enabled,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: RapidOcrOptions
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
@ -100,18 +109,26 @@ class RapidOcrModel(BaseOcrModel):
if result is not None:
cells = [
OcrCell(
id=ix,
TextCell(
index=ix,
text=line[1],
orig=line[1],
confidence=line[2],
bbox=BoundingBox.from_tuple(
from_ocr=True,
rect=BoundingRectangle.from_bounding_box(
BoundingBox.from_tuple(
coord=(
(line[0][0][0] / self.scale) + ocr_rect.l,
(line[0][0][1] / self.scale) + ocr_rect.t,
(line[0][2][0] / self.scale) + ocr_rect.l,
(line[0][2][1] / self.scale) + ocr_rect.t,
(line[0][0][0] / self.scale)
+ ocr_rect.l,
(line[0][0][1] / self.scale)
+ ocr_rect.t,
(line[0][2][0] / self.scale)
+ ocr_rect.l,
(line[0][2][1] / self.scale)
+ ocr_rect.t,
),
origin=CoordOrigin.TOPLEFT,
)
),
)
for ix, line in enumerate(result)
@ -126,3 +143,7 @@ class RapidOcrModel(BaseOcrModel):
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
yield page
@classmethod
def get_options_type(cls) -> Type[OcrOptions]:
return RapidOcrOptions

View File

@ -5,6 +5,7 @@ from typing import Iterable, Optional, Union
import numpy
from docling_core.types.doc import BoundingBox, DocItemLabel, TableCell
from docling_core.types.doc.page import BoundingRectangle
from docling_ibm_models.tableformer.data_management.tf_predictor import TFPredictor
from PIL import ImageDraw
@ -129,7 +130,7 @@ class TableStructureModel(BasePageModel):
draw.rectangle([(x0, y0), (x1, y1)], outline="red")
for cell in table_element.cluster.cells:
x0, y0, x1, y1 = cell.bbox.as_tuple()
x0, y0, x1, y1 = cell.rect.to_bounding_box().as_tuple()
x0 *= scale_x
x1 *= scale_x
y0 *= scale_x
@ -223,11 +224,19 @@ class TableStructureModel(BasePageModel):
# Only allow non empty stings (spaces) into the cells of a table
if len(c.text.strip()) > 0:
new_cell = copy.deepcopy(c)
new_cell.bbox = new_cell.bbox.scaled(
new_cell.rect = BoundingRectangle.from_bounding_box(
new_cell.rect.to_bounding_box().scaled(
scale=self.scale
)
)
tokens.append(new_cell.model_dump())
tokens.append(
{
"id": new_cell.index,
"text": new_cell.text,
"bbox": new_cell.rect.to_bounding_box().model_dump(),
}
)
page_input["tokens"] = tokens
tf_output = self.tf_predictor.multi_table_predict(

View File

@ -3,15 +3,21 @@ import io
import logging
import os
import tempfile
from pathlib import Path
from subprocess import DEVNULL, PIPE, Popen
from typing import Iterable, List, Optional, Tuple
from typing import Iterable, List, Optional, Tuple, Type
import pandas as pd
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, TextCell
from docling.datamodel.base_models import Cell, OcrCell, Page
from docling.datamodel.base_models import Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import TesseractCliOcrOptions
from docling.datamodel.pipeline_options import (
AcceleratorOptions,
OcrOptions,
TesseractCliOcrOptions,
)
from docling.datamodel.settings import settings
from docling.models.base_ocr_model import BaseOcrModel
from docling.utils.ocr_utils import map_tesseract_script
@ -21,8 +27,19 @@ _log = logging.getLogger(__name__)
class TesseractOcrCliModel(BaseOcrModel):
def __init__(self, enabled: bool, options: TesseractCliOcrOptions):
super().__init__(enabled=enabled, options=options)
def __init__(
self,
enabled: bool,
artifacts_path: Optional[Path],
options: TesseractCliOcrOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(
enabled=enabled,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: TesseractCliOcrOptions
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
@ -228,11 +245,14 @@ class TesseractOcrCliModel(BaseOcrModel):
t = b + h
r = l + w
cell = OcrCell(
id=ix,
cell = TextCell(
index=ix,
text=text,
orig=text,
from_ocr=True,
confidence=conf / 100.0,
bbox=BoundingBox.from_tuple(
rect=BoundingRectangle.from_bounding_box(
BoundingBox.from_tuple(
coord=(
(l / self.scale) + ocr_rect.l,
(b / self.scale) + ocr_rect.t,
@ -240,6 +260,7 @@ class TesseractOcrCliModel(BaseOcrModel):
(t / self.scale) + ocr_rect.t,
),
origin=CoordOrigin.TOPLEFT,
)
),
)
all_ocr_cells.append(cell)
@ -252,3 +273,7 @@ class TesseractOcrCliModel(BaseOcrModel):
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
yield page
@classmethod
def get_options_type(cls) -> Type[OcrOptions]:
return TesseractCliOcrOptions

View File

@ -1,11 +1,17 @@
import logging
from typing import Iterable
from pathlib import Path
from typing import Iterable, Optional, Type
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import BoundingRectangle, TextCell
from docling.datamodel.base_models import Cell, OcrCell, Page
from docling.datamodel.base_models import Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import TesseractOcrOptions
from docling.datamodel.pipeline_options import (
AcceleratorOptions,
OcrOptions,
TesseractOcrOptions,
)
from docling.datamodel.settings import settings
from docling.models.base_ocr_model import BaseOcrModel
from docling.utils.ocr_utils import map_tesseract_script
@ -15,8 +21,19 @@ _log = logging.getLogger(__name__)
class TesseractOcrModel(BaseOcrModel):
def __init__(self, enabled: bool, options: TesseractOcrOptions):
super().__init__(enabled=enabled, options=options)
def __init__(
self,
enabled: bool,
artifacts_path: Optional[Path],
options: TesseractOcrOptions,
accelerator_options: AcceleratorOptions,
):
super().__init__(
enabled=enabled,
artifacts_path=artifacts_path,
options=options,
accelerator_options=accelerator_options,
)
self.options: TesseractOcrOptions
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
@ -173,14 +190,18 @@ class TesseractOcrModel(BaseOcrModel):
top = (box["y"] + box["h"]) / self.scale
cells.append(
OcrCell(
id=ix,
TextCell(
index=ix,
text=text,
orig=text,
from_ocr=True,
confidence=confidence,
bbox=BoundingBox.from_tuple(
rect=BoundingRectangle.from_bounding_box(
BoundingBox.from_tuple(
coord=(left, top, right, bottom),
origin=CoordOrigin.TOPLEFT,
),
),
)
)
@ -195,3 +216,7 @@ class TesseractOcrModel(BaseOcrModel):
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
yield page
@classmethod
def get_options_type(cls) -> Type[OcrOptions]:
return TesseractOcrOptions

View File

@ -10,16 +10,7 @@ from docling.backend.abstract_backend import AbstractDocumentBackend
from docling.backend.pdf_backend import PdfDocumentBackend
from docling.datamodel.base_models import AssembledUnit, Page
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import (
EasyOcrOptions,
OcrMacOptions,
PdfPipelineOptions,
PictureDescriptionApiOptions,
PictureDescriptionVlmOptions,
RapidOcrOptions,
TesseractCliOcrOptions,
TesseractOcrOptions,
)
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.settings import settings
from docling.models.base_ocr_model import BaseOcrModel
from docling.models.code_formula_model import CodeFormulaModel, CodeFormulaModelOptions
@ -27,22 +18,16 @@ from docling.models.document_picture_classifier import (
DocumentPictureClassifier,
DocumentPictureClassifierOptions,
)
from docling.models.easyocr_model import EasyOcrModel
from docling.models.factories import get_ocr_factory, get_picture_description_factory
from docling.models.layout_model import LayoutModel
from docling.models.ocr_mac_model import OcrMacModel
from docling.models.page_assemble_model import PageAssembleModel, PageAssembleOptions
from docling.models.page_preprocessing_model import (
PagePreprocessingModel,
PagePreprocessingOptions,
)
from docling.models.picture_description_api_model import PictureDescriptionApiModel
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
from docling.models.picture_description_vlm_model import PictureDescriptionVlmModel
from docling.models.rapid_ocr_model import RapidOcrModel
from docling.models.readingorder_model import ReadingOrderModel, ReadingOrderOptions
from docling.models.table_structure_model import TableStructureModel
from docling.models.tesseract_ocr_cli_model import TesseractOcrCliModel
from docling.models.tesseract_ocr_model import TesseractOcrModel
from docling.pipeline.base_pipeline import PaginatedPipeline
from docling.utils.model_downloader import download_models
from docling.utils.profiling import ProfilingScope, TimeRecorder
@ -78,16 +63,14 @@ class StandardPdfPipeline(PaginatedPipeline):
self.glm_model = ReadingOrderModel(options=ReadingOrderOptions())
if (ocr_model := self.get_ocr_model(artifacts_path=artifacts_path)) is None:
raise RuntimeError(
f"The specified OCR kind is not supported: {pipeline_options.ocr_options.kind}."
)
ocr_model = self.get_ocr_model(artifacts_path=artifacts_path)
self.build_pipe = [
# Pre-processing
PagePreprocessingModel(
options=PagePreprocessingOptions(
images_scale=pipeline_options.images_scale
images_scale=pipeline_options.images_scale,
create_parsed_page=pipeline_options.generate_parsed_pages,
)
),
# OCR
@ -163,66 +146,30 @@ class StandardPdfPipeline(PaginatedPipeline):
output_dir = download_models(output_dir=local_dir, force=force, progress=False)
return output_dir
def get_ocr_model(
self, artifacts_path: Optional[Path] = None
) -> Optional[BaseOcrModel]:
if isinstance(self.pipeline_options.ocr_options, EasyOcrOptions):
return EasyOcrModel(
def get_ocr_model(self, artifacts_path: Optional[Path] = None) -> BaseOcrModel:
factory = get_ocr_factory(
allow_external_plugins=self.pipeline_options.allow_external_plugins
)
return factory.create_instance(
options=self.pipeline_options.ocr_options,
enabled=self.pipeline_options.do_ocr,
artifacts_path=artifacts_path,
options=self.pipeline_options.ocr_options,
accelerator_options=self.pipeline_options.accelerator_options,
)
elif isinstance(self.pipeline_options.ocr_options, TesseractCliOcrOptions):
return TesseractOcrCliModel(
enabled=self.pipeline_options.do_ocr,
options=self.pipeline_options.ocr_options,
)
elif isinstance(self.pipeline_options.ocr_options, TesseractOcrOptions):
return TesseractOcrModel(
enabled=self.pipeline_options.do_ocr,
options=self.pipeline_options.ocr_options,
)
elif isinstance(self.pipeline_options.ocr_options, RapidOcrOptions):
return RapidOcrModel(
enabled=self.pipeline_options.do_ocr,
options=self.pipeline_options.ocr_options,
accelerator_options=self.pipeline_options.accelerator_options,
)
elif isinstance(self.pipeline_options.ocr_options, OcrMacOptions):
if "darwin" != sys.platform:
raise RuntimeError(
f"The specified OCR type is only supported on Mac: {self.pipeline_options.ocr_options.kind}."
)
return OcrMacModel(
enabled=self.pipeline_options.do_ocr,
options=self.pipeline_options.ocr_options,
)
return None
def get_picture_description_model(
self, artifacts_path: Optional[Path] = None
) -> Optional[PictureDescriptionBaseModel]:
if isinstance(
self.pipeline_options.picture_description_options,
PictureDescriptionApiOptions,
):
return PictureDescriptionApiModel(
factory = get_picture_description_factory(
allow_external_plugins=self.pipeline_options.allow_external_plugins
)
return factory.create_instance(
options=self.pipeline_options.picture_description_options,
enabled=self.pipeline_options.do_picture_description,
enable_remote_services=self.pipeline_options.enable_remote_services,
options=self.pipeline_options.picture_description_options,
)
elif isinstance(
self.pipeline_options.picture_description_options,
PictureDescriptionVlmOptions,
):
return PictureDescriptionVlmModel(
enabled=self.pipeline_options.do_picture_description,
artifacts_path=artifacts_path,
options=self.pipeline_options.picture_description_options,
accelerator_options=self.pipeline_options.accelerator_options,
)
return None
def initialize_page(self, conv_res: ConversionResult, page: Page) -> Page:
with TimeRecorder(conv_res, "page_init"):

View File

@ -1,41 +1,20 @@
import itertools
import logging
import re
import warnings
from io import BytesIO
# from io import BytesIO
from pathlib import Path
from typing import Optional
from typing import List, Optional, Union, cast
from docling_core.types import DoclingDocument
from docling_core.types.doc import (
BoundingBox,
DocItem,
DocItemLabel,
DoclingDocument,
GroupLabel,
ImageRef,
ImageRefMode,
PictureItem,
ProvenanceItem,
Size,
TableCell,
TableData,
TableItem,
)
from docling_core.types.doc.tokens import DocumentToken, TableToken
# from docling_core.types import DoclingDocument
from docling_core.types.doc import BoundingBox, DocItem, ImageRef, PictureItem, TextItem
from docling_core.types.doc.document import DocTagsDocument
from PIL import Image as PILImage
from docling.backend.abstract_backend import AbstractDocumentBackend
from docling.backend.md_backend import MarkdownDocumentBackend
from docling.backend.pdf_backend import PdfDocumentBackend
from docling.datamodel.base_models import InputFormat, Page
from docling.datamodel.document import ConversionResult, InputDocument
from docling.datamodel.pipeline_options import (
PdfPipelineOptions,
ResponseFormat,
VlmPipelineOptions,
)
from docling.datamodel.pipeline_options import ResponseFormat, VlmPipelineOptions
from docling.datamodel.settings import settings
from docling.models.hf_vlm_model import HuggingFaceVlmModel
from docling.pipeline.base_pipeline import PaginatedPipeline
@ -100,6 +79,15 @@ class VlmPipeline(PaginatedPipeline):
return page
def extract_text_from_backend(self, page: Page, bbox: BoundingBox | None) -> str:
# Convert bounding box normalized to 0-100 into page coordinates for cropping
text = ""
if bbox:
if page.size:
if page._backend:
text = page._backend.get_text_in_rect(bbox)
return text
def _assemble_document(self, conv_res: ConversionResult) -> ConversionResult:
with TimeRecorder(conv_res, "doc_assemble", scope=ProfilingScope.DOCUMENT):
@ -107,7 +95,45 @@ class VlmPipeline(PaginatedPipeline):
self.pipeline_options.vlm_options.response_format
== ResponseFormat.DOCTAGS
):
conv_res.document = self._turn_tags_into_doc(conv_res.pages)
doctags_list = []
image_list = []
for page in conv_res.pages:
predicted_doctags = ""
img = PILImage.new("RGB", (1, 1), "rgb(255,255,255)")
if page.predictions.vlm_response:
predicted_doctags = page.predictions.vlm_response.text
if page.image:
img = page.image
image_list.append(img)
doctags_list.append(predicted_doctags)
doctags_list_c = cast(List[Union[Path, str]], doctags_list)
image_list_c = cast(List[Union[Path, PILImage.Image]], image_list)
doctags_doc = DocTagsDocument.from_doctags_and_image_pairs(
doctags_list_c, image_list_c
)
conv_res.document.load_from_doctags(doctags_doc)
# If forced backend text, replace model predicted text with backend one
if page.size:
if self.force_backend_text:
scale = self.pipeline_options.images_scale
for element, _level in conv_res.document.iterate_items():
if (
not isinstance(element, TextItem)
or len(element.prov) == 0
):
continue
crop_bbox = (
element.prov[0]
.bbox.scaled(scale=scale)
.to_top_left_origin(
page_height=page.size.height * scale
)
)
txt = self.extract_text_from_backend(page, crop_bbox)
element.text = txt
element.orig = txt
elif (
self.pipeline_options.vlm_options.response_format
== ResponseFormat.MARKDOWN
@ -165,366 +191,6 @@ class VlmPipeline(PaginatedPipeline):
)
return backend.convert()
def _turn_tags_into_doc(self, pages: list[Page]) -> DoclingDocument:
###############################################
# Tag definitions and color mappings
###############################################
# Maps the recognized tag to a Docling label.
# Code items will be given DocItemLabel.CODE
tag_to_doclabel = {
"title": DocItemLabel.TITLE,
"document_index": DocItemLabel.DOCUMENT_INDEX,
"otsl": DocItemLabel.TABLE,
"section_header_level_1": DocItemLabel.SECTION_HEADER,
"checkbox_selected": DocItemLabel.CHECKBOX_SELECTED,
"checkbox_unselected": DocItemLabel.CHECKBOX_UNSELECTED,
"text": DocItemLabel.TEXT,
"page_header": DocItemLabel.PAGE_HEADER,
"page_footer": DocItemLabel.PAGE_FOOTER,
"formula": DocItemLabel.FORMULA,
"caption": DocItemLabel.CAPTION,
"picture": DocItemLabel.PICTURE,
"list_item": DocItemLabel.LIST_ITEM,
"footnote": DocItemLabel.FOOTNOTE,
"code": DocItemLabel.CODE,
}
# Maps each tag to an associated bounding box color.
tag_to_color = {
"title": "blue",
"document_index": "darkblue",
"otsl": "green",
"section_header_level_1": "purple",
"checkbox_selected": "black",
"checkbox_unselected": "gray",
"text": "red",
"page_header": "orange",
"page_footer": "cyan",
"formula": "pink",
"caption": "magenta",
"picture": "yellow",
"list_item": "brown",
"footnote": "darkred",
"code": "lightblue",
}
def extract_bounding_box(text_chunk: str) -> Optional[BoundingBox]:
"""Extracts <loc_...> bounding box coords from the chunk, normalized by / 500."""
coords = re.findall(r"<loc_(\d+)>", text_chunk)
if len(coords) == 4:
l, t, r, b = map(float, coords)
return BoundingBox(l=l / 500, t=t / 500, r=r / 500, b=b / 500)
return None
def extract_inner_text(text_chunk: str) -> str:
"""Strips all <...> tags inside the chunk to get the raw text content."""
return re.sub(r"<.*?>", "", text_chunk, flags=re.DOTALL).strip()
def extract_text_from_backend(page: Page, bbox: BoundingBox | None) -> str:
# Convert bounding box normalized to 0-100 into page coordinates for cropping
text = ""
if bbox:
if page.size:
bbox.l = bbox.l * page.size.width
bbox.t = bbox.t * page.size.height
bbox.r = bbox.r * page.size.width
bbox.b = bbox.b * page.size.height
if page._backend:
text = page._backend.get_text_in_rect(bbox)
return text
def otsl_parse_texts(texts, tokens):
split_word = TableToken.OTSL_NL.value
split_row_tokens = [
list(y)
for x, y in itertools.groupby(tokens, lambda z: z == split_word)
if not x
]
table_cells = []
r_idx = 0
c_idx = 0
def count_right(tokens, c_idx, r_idx, which_tokens):
span = 0
c_idx_iter = c_idx
while tokens[r_idx][c_idx_iter] in which_tokens:
c_idx_iter += 1
span += 1
if c_idx_iter >= len(tokens[r_idx]):
return span
return span
def count_down(tokens, c_idx, r_idx, which_tokens):
span = 0
r_idx_iter = r_idx
while tokens[r_idx_iter][c_idx] in which_tokens:
r_idx_iter += 1
span += 1
if r_idx_iter >= len(tokens):
return span
return span
for i, text in enumerate(texts):
cell_text = ""
if text in [
TableToken.OTSL_FCEL.value,
TableToken.OTSL_ECEL.value,
TableToken.OTSL_CHED.value,
TableToken.OTSL_RHED.value,
TableToken.OTSL_SROW.value,
]:
row_span = 1
col_span = 1
right_offset = 1
if text != TableToken.OTSL_ECEL.value:
cell_text = texts[i + 1]
right_offset = 2
# Check next element(s) for lcel / ucel / xcel, set properly row_span, col_span
next_right_cell = ""
if i + right_offset < len(texts):
next_right_cell = texts[i + right_offset]
next_bottom_cell = ""
if r_idx + 1 < len(split_row_tokens):
if c_idx < len(split_row_tokens[r_idx + 1]):
next_bottom_cell = split_row_tokens[r_idx + 1][c_idx]
if next_right_cell in [
TableToken.OTSL_LCEL.value,
TableToken.OTSL_XCEL.value,
]:
# we have horisontal spanning cell or 2d spanning cell
col_span += count_right(
split_row_tokens,
c_idx + 1,
r_idx,
[TableToken.OTSL_LCEL.value, TableToken.OTSL_XCEL.value],
)
if next_bottom_cell in [
TableToken.OTSL_UCEL.value,
TableToken.OTSL_XCEL.value,
]:
# we have a vertical spanning cell or 2d spanning cell
row_span += count_down(
split_row_tokens,
c_idx,
r_idx + 1,
[TableToken.OTSL_UCEL.value, TableToken.OTSL_XCEL.value],
)
table_cells.append(
TableCell(
text=cell_text.strip(),
row_span=row_span,
col_span=col_span,
start_row_offset_idx=r_idx,
end_row_offset_idx=r_idx + row_span,
start_col_offset_idx=c_idx,
end_col_offset_idx=c_idx + col_span,
)
)
if text in [
TableToken.OTSL_FCEL.value,
TableToken.OTSL_ECEL.value,
TableToken.OTSL_CHED.value,
TableToken.OTSL_RHED.value,
TableToken.OTSL_SROW.value,
TableToken.OTSL_LCEL.value,
TableToken.OTSL_UCEL.value,
TableToken.OTSL_XCEL.value,
]:
c_idx += 1
if text == TableToken.OTSL_NL.value:
r_idx += 1
c_idx = 0
return table_cells, split_row_tokens
def otsl_extract_tokens_and_text(s: str):
# Pattern to match anything enclosed by < > (including the angle brackets themselves)
pattern = r"(<[^>]+>)"
# Find all tokens (e.g. "<otsl>", "<loc_140>", etc.)
tokens = re.findall(pattern, s)
# Remove any tokens that start with "<loc_"
tokens = [
token
for token in tokens
if not (
token.startswith(rf"<{DocumentToken.LOC.value}")
or token
in [
rf"<{DocumentToken.OTSL.value}>",
rf"</{DocumentToken.OTSL.value}>",
]
)
]
# Split the string by those tokens to get the in-between text
text_parts = re.split(pattern, s)
text_parts = [
token
for token in text_parts
if not (
token.startswith(rf"<{DocumentToken.LOC.value}")
or token
in [
rf"<{DocumentToken.OTSL.value}>",
rf"</{DocumentToken.OTSL.value}>",
]
)
]
# Remove any empty or purely whitespace strings from text_parts
text_parts = [part for part in text_parts if part.strip()]
return tokens, text_parts
def parse_table_content(otsl_content: str) -> TableData:
tokens, mixed_texts = otsl_extract_tokens_and_text(otsl_content)
table_cells, split_row_tokens = otsl_parse_texts(mixed_texts, tokens)
return TableData(
num_rows=len(split_row_tokens),
num_cols=(
max(len(row) for row in split_row_tokens) if split_row_tokens else 0
),
table_cells=table_cells,
)
doc = DoclingDocument(name="Document")
for pg_idx, page in enumerate(pages):
xml_content = ""
predicted_text = ""
if page.predictions.vlm_response:
predicted_text = page.predictions.vlm_response.text
image = page.image
page_no = pg_idx + 1
bounding_boxes = []
if page.size:
pg_width = page.size.width
pg_height = page.size.height
size = Size(width=pg_width, height=pg_height)
parent_page = doc.add_page(page_no=page_no, size=size)
"""
1. Finds all <tag>...</tag> blocks in the entire string (multi-line friendly) in the order they appear.
2. For each chunk, extracts bounding box (if any) and inner text.
3. Adds the item to a DoclingDocument structure with the right label.
4. Tracks bounding boxes + color in a separate list for later visualization.
"""
# Regex for all recognized tags
tag_pattern = (
rf"<(?P<tag>{DocItemLabel.TITLE}|{DocItemLabel.DOCUMENT_INDEX}|"
rf"{DocItemLabel.CHECKBOX_UNSELECTED}|{DocItemLabel.CHECKBOX_SELECTED}|"
rf"{DocItemLabel.TEXT}|{DocItemLabel.PAGE_HEADER}|"
rf"{DocItemLabel.PAGE_FOOTER}|{DocItemLabel.FORMULA}|"
rf"{DocItemLabel.CAPTION}|{DocItemLabel.PICTURE}|"
rf"{DocItemLabel.LIST_ITEM}|{DocItemLabel.FOOTNOTE}|{DocItemLabel.CODE}|"
rf"{DocItemLabel.SECTION_HEADER}_level_1|{DocumentToken.OTSL.value})>.*?</(?P=tag)>"
)
# DocumentToken.OTSL
pattern = re.compile(tag_pattern, re.DOTALL)
# Go through each match in order
for match in pattern.finditer(predicted_text):
full_chunk = match.group(0)
tag_name = match.group("tag")
bbox = extract_bounding_box(full_chunk)
doc_label = tag_to_doclabel.get(tag_name, DocItemLabel.PARAGRAPH)
color = tag_to_color.get(tag_name, "white")
# Store bounding box + color
if bbox:
bounding_boxes.append((bbox, color))
if tag_name == DocumentToken.OTSL.value:
table_data = parse_table_content(full_chunk)
bbox = extract_bounding_box(full_chunk)
if bbox:
prov = ProvenanceItem(
bbox=bbox.resize_by_scale(pg_width, pg_height),
charspan=(0, 0),
page_no=page_no,
)
doc.add_table(data=table_data, prov=prov)
else:
doc.add_table(data=table_data)
elif tag_name == DocItemLabel.PICTURE:
text_caption_content = extract_inner_text(full_chunk)
if image:
if bbox:
im_width, im_height = image.size
crop_box = (
int(bbox.l * im_width),
int(bbox.t * im_height),
int(bbox.r * im_width),
int(bbox.b * im_height),
)
cropped_image = image.crop(crop_box)
pic = doc.add_picture(
parent=None,
image=ImageRef.from_pil(image=cropped_image, dpi=72),
prov=(
ProvenanceItem(
bbox=bbox.resize_by_scale(pg_width, pg_height),
charspan=(0, 0),
page_no=page_no,
)
),
)
# If there is a caption to an image, add it as well
if len(text_caption_content) > 0:
caption_item = doc.add_text(
label=DocItemLabel.CAPTION,
text=text_caption_content,
parent=None,
)
pic.captions.append(caption_item.get_ref())
else:
if bbox:
# In case we don't have access to an binary of an image
doc.add_picture(
parent=None,
prov=ProvenanceItem(
bbox=bbox, charspan=(0, 0), page_no=page_no
),
)
# If there is a caption to an image, add it as well
if len(text_caption_content) > 0:
caption_item = doc.add_text(
label=DocItemLabel.CAPTION,
text=text_caption_content,
parent=None,
)
pic.captions.append(caption_item.get_ref())
else:
# For everything else, treat as text
if self.force_backend_text:
text_content = extract_text_from_backend(page, bbox)
else:
text_content = extract_inner_text(full_chunk)
doc.add_text(
label=doc_label,
text=text_content,
prov=(
ProvenanceItem(
bbox=bbox.resize_by_scale(pg_width, pg_height),
charspan=(0, len(text_content)),
page_no=page_no,
)
if bbox
else None
),
)
return doc
@classmethod
def get_default_options(cls) -> VlmPipelineOptions:
return VlmPipelineOptions()

View File

@ -2,9 +2,9 @@ import logging
from typing import Any, Dict, Iterable, List, Tuple, Union
from docling_core.types.doc import BoundingBox, CoordOrigin
from docling_core.types.doc.page import TextCell
from docling_core.types.legacy_doc.base import BaseCell, BaseText, Ref, Table
from docling.datamodel.base_models import OcrCell
from docling.datamodel.document import ConversionResult, Page
_log = logging.getLogger(__name__)
@ -86,11 +86,13 @@ def generate_multimodal_pages(
if page.size is None:
return cells
for cell in page.cells:
new_bbox = cell.bbox.to_top_left_origin(
page_height=page.size.height
).normalized(page_size=page.size)
is_ocr = isinstance(cell, OcrCell)
ocr_confidence = cell.confidence if isinstance(cell, OcrCell) else 1.0
new_bbox = (
cell.rect.to_bounding_box()
.to_top_left_origin(page_height=page.size.height)
.normalized(page_size=page.size)
)
is_ocr = cell.from_ocr
ocr_confidence = cell.confidence
cells.append(
{
"text": cell.text,

View File

@ -5,9 +5,10 @@ from collections import defaultdict
from typing import Dict, List, Set, Tuple
from docling_core.types.doc import DocItemLabel, Size
from docling_core.types.doc.page import TextCell
from rtree import index
from docling.datamodel.base_models import BoundingBox, Cell, Cluster, OcrCell
from docling.datamodel.base_models import BoundingBox, Cluster
_log = logging.getLogger(__name__)
@ -198,7 +199,7 @@ class LayoutPostprocessor:
DocItemLabel.TITLE: DocItemLabel.SECTION_HEADER,
}
def __init__(self, cells: List[Cell], clusters: List[Cluster], page_size: Size):
def __init__(self, cells: List[TextCell], clusters: List[Cluster], page_size: Size):
"""Initialize processor with cells and clusters."""
"""Initialize processor with cells and spatial indices."""
self.cells = cells
@ -218,7 +219,7 @@ class LayoutPostprocessor:
[c for c in self.special_clusters if c.label in self.WRAPPER_TYPES]
)
def postprocess(self) -> Tuple[List[Cluster], List[Cell]]:
def postprocess(self) -> Tuple[List[Cluster], List[TextCell]]:
"""Main processing pipeline."""
self.regular_clusters = self._process_regular_clusters()
self.special_clusters = self._process_special_clusters()
@ -271,15 +272,13 @@ class LayoutPostprocessor:
next_id = max((c.id for c in self.all_clusters), default=0) + 1
orphan_clusters = []
for i, cell in enumerate(unassigned):
conf = 1.0
if isinstance(cell, OcrCell):
conf = cell.confidence
orphan_clusters.append(
Cluster(
id=next_id + i,
label=DocItemLabel.TEXT,
bbox=cell.bbox,
bbox=cell.to_bounding_box(),
confidence=conf,
cells=[cell],
)
@ -557,13 +556,13 @@ class LayoutPostprocessor:
return current_best if current_best else clusters[0]
def _deduplicate_cells(self, cells: List[Cell]) -> List[Cell]:
def _deduplicate_cells(self, cells: List[TextCell]) -> List[TextCell]:
"""Ensure each cell appears only once, maintaining order of first appearance."""
seen_ids = set()
unique_cells = []
for cell in cells:
if cell.id not in seen_ids:
seen_ids.add(cell.id)
if cell.index not in seen_ids:
seen_ids.add(cell.index)
unique_cells.append(cell)
return unique_cells
@ -582,11 +581,13 @@ class LayoutPostprocessor:
best_cluster = None
for cluster in clusters:
if cell.bbox.area() <= 0:
if cell.rect.to_bounding_box().area() <= 0:
continue
overlap = cell.bbox.intersection_area_with(cluster.bbox)
overlap_ratio = overlap / cell.bbox.area()
overlap = cell.rect.to_bounding_box().intersection_area_with(
cluster.bbox
)
overlap_ratio = overlap / cell.rect.to_bounding_box().area()
if overlap_ratio > best_overlap:
best_overlap = overlap_ratio
@ -601,11 +602,13 @@ class LayoutPostprocessor:
return clusters
def _find_unassigned_cells(self, clusters: List[Cluster]) -> List[Cell]:
def _find_unassigned_cells(self, clusters: List[Cluster]) -> List[TextCell]:
"""Find cells not assigned to any cluster."""
assigned = {cell.id for cluster in clusters for cell in cluster.cells}
assigned = {cell.index for cluster in clusters for cell in cluster.cells}
return [
cell for cell in self.cells if cell.id not in assigned and cell.text.strip()
cell
for cell in self.cells
if cell.index not in assigned and cell.text.strip()
]
def _adjust_cluster_bboxes(self, clusters: List[Cluster]) -> List[Cluster]:
@ -615,10 +618,10 @@ class LayoutPostprocessor:
continue
cells_bbox = BoundingBox(
l=min(cell.bbox.l for cell in cluster.cells),
t=min(cell.bbox.t for cell in cluster.cells),
r=max(cell.bbox.r for cell in cluster.cells),
b=max(cell.bbox.b for cell in cluster.cells),
l=min(cell.rect.to_bounding_box().l for cell in cluster.cells),
t=min(cell.rect.to_bounding_box().t for cell in cluster.cells),
r=max(cell.rect.to_bounding_box().r for cell in cluster.cells),
b=max(cell.rect.to_bounding_box().b for cell in cluster.cells),
)
if cluster.label == DocItemLabel.TABLE:
@ -634,9 +637,9 @@ class LayoutPostprocessor:
return clusters
def _sort_cells(self, cells: List[Cell]) -> List[Cell]:
def _sort_cells(self, cells: List[TextCell]) -> List[TextCell]:
"""Sort cells in native reading order."""
return sorted(cells, key=lambda c: (c.id))
return sorted(cells, key=lambda c: (c.index))
def _sort_clusters(
self, clusters: List[Cluster], mode: str = "id"
@ -647,7 +650,7 @@ class LayoutPostprocessor:
clusters,
key=lambda cluster: (
(
min(cell.id for cell in cluster.cells)
min(cell.index for cell in cluster.cells)
if cluster.cells
else sys.maxsize
),

View File

@ -25,7 +25,7 @@ def draw_clusters(
# Draw cells first (underneath)
cell_color = (0, 0, 0, 40) # Transparent black for cells
for tc in c.cells:
cx0, cy0, cx1, cy1 = tc.bbox.as_tuple()
cx0, cy0, cx1, cy1 = tc.rect.to_bounding_box().as_tuple()
cx0 *= scale_x
cx1 *= scale_x
cy0 *= scale_x

View File

@ -7,6 +7,7 @@ from typing import Iterable
import yaml
from docling_core.types.doc import ImageRefMode
from docling.backend.docling_parse_v4_backend import DoclingParseV4DocumentBackend
from docling.datamodel.base_models import ConversionStatus, InputFormat
from docling.datamodel.document import ConversionResult
from docling.datamodel.pipeline_options import PdfPipelineOptions
@ -60,6 +61,18 @@ def export_documents(
with (output_dir / f"{doc_filename}.yaml").open("w") as fp:
fp.write(yaml.safe_dump(conv_res.document.export_to_dict()))
# Export Docling document format to doctags:
with (output_dir / f"{doc_filename}.doctags.txt").open("w") as fp:
fp.write(conv_res.document.export_to_document_tokens())
# Export Docling document format to markdown:
with (output_dir / f"{doc_filename}.md").open("w") as fp:
fp.write(conv_res.document.export_to_markdown())
# Export Docling document format to text:
with (output_dir / f"{doc_filename}.txt").open("w") as fp:
fp.write(conv_res.document.export_to_markdown(strict_text=True))
if USE_LEGACY:
# Export Deep Search document JSON format:
with (output_dir / f"{doc_filename}.legacy.json").open(
@ -131,7 +144,9 @@ def main():
doc_converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options, backend=DoclingParseV4DocumentBackend
)
}
)

View File

@ -13,6 +13,7 @@
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![License MIT](https://img.shields.io/github/license/docling-project/docling)](https://opensource.org/licenses/MIT)
[![PyPI Downloads](https://static.pepy.tech/badge/docling/month)](https://pepy.tech/projects/docling)
[![LF AI & Data](https://img.shields.io/badge/LF%20AI%20%26%20Data-003778?logo=linuxfoundation&logoColor=fff&color=0094ff&labelColor=003778)](https://lfaidata.foundation/projects/)
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
@ -25,12 +26,12 @@ Docling simplifies document processing, parsing diverse formats — including ad
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
* 🔍 Extensive OCR support for scanned PDFs and images
* 🥚 Support of Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
* 💻 Simple and convenient CLI
### Coming soon
* 📝 Metadata extraction, including title, authors, references & language
* 📝 Inclusion of Visual Language Models ([SmolDocling](https://huggingface.co/blog/smolervlm#smoldocling))
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
* 📝 Complex chemistry understanding (Molecular structures)
@ -43,9 +44,13 @@ Docling simplifies document processing, parsing diverse formats — including ad
<a href="reference/document_converter/" class="card"><b>Reference</b><br />See more API details</a>
</div>
## IBM ❤️ Open Source AI
## LF AI & Data
Docling has been brought to you by IBM.
Docling is hosted as a project in the [LF AI & Data Foundation](https://lfaidata.foundation/projects/).
### IBM ❤️ Open Source AI
The project was started by the AI for knowledge team at IBM Research Zurich.
[supported_formats]: ./usage/supported_formats.md
[docling_document]: ./concepts/docling_document.md

View File

@ -0,0 +1,35 @@
You can run Docling in the cloud without installation using the [Docling Actor][apify] on Apify platform. Simply provide a document URL and get the processed result:
<a href="https://apify.com/vancura/docling?fpr=docling"><img src="https://apify.com/ext/run-on-apify.png" alt="Run Docling Actor on Apify" width="176" height="39" /></a>
```bash
apify call vancura/docling -i '{
"options": {
"to_formats": ["md", "json", "html", "text", "doctags"]
},
"http_sources": [
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
{"url": "https://arxiv.org/pdf/2408.09869"}
]
}'
```
The Actor stores results in:
* Processed document in key-value store (`OUTPUT_RESULT`)
* Processing logs (`DOCLING_LOG`)
* Dataset record with result URL and status
Read more about the [Docling Actor](.actor/README.md), including how to use it via the Apify API and CLI.
- 💻 [GitHub][github]
- 📖 [Docs][docs]
- 📦 [Docling Actor][apify]
[github]: https://github.com/docling-project/docling/tree/main/.actor/
[docs]: https://github.com/docling-project/docling/tree/main/.actor/README.md
[apify]: https://apify.com/vancura/docling?fpr=docling

View File

@ -8,7 +8,7 @@ The following table provides an overview of the default enrichment models availa
| ------- | --------- | ---------------| ----------- |
| Code understanding | `do_code_enrichment` | `CodeItem` | See [docs below](#code-understanding). |
| Formula understanding | `do_formula_enrichment` | `TextItem` with label `FORMULA` | See [docs below](#formula-understanding). |
| Picrure classification | `do_picture_classification` | `PictureItem` | See [docs below](#picture-classification). |
| Picture classification | `do_picture_classification` | `PictureItem` | See [docs below](#picture-classification). |
| Picture description | `do_picture_description` | `PictureItem` | See [docs below](#picture-description). |

View File

@ -111,6 +111,7 @@ nav:
- "LlamaIndex": integrations/llamaindex.md
- "txtai": integrations/txtai.md
- ⭐️ Featured:
- "Apify": integrations/apify.md
- "Data Prep Kit": integrations/data_prep_kit.md
- "InstructLab": integrations/instructlab.md
- "NVIDIA": integrations/nvidia.md

603
poetry.lock generated
View File

@ -2,17 +2,17 @@
[[package]]
name = "accelerate"
version = "1.4.0"
version = "1.5.1"
description = "Accelerate"
optional = true
python-versions = ">=3.9.0"
files = [
{file = "accelerate-1.4.0-py3-none-any.whl", hash = "sha256:f6e1e7dfaf9d799a20a1dc45efbf4b1546163eac133faa5acd0d89177c896e55"},
{file = "accelerate-1.4.0.tar.gz", hash = "sha256:37d413e1b64cb8681ccd2908ae211cf73e13e6e636a2f598a96eccaa538773a5"},
{file = "accelerate-1.5.1-py3-none-any.whl", hash = "sha256:4838cff9ed1bb0ddc9d967530ced62a1d74ea21cdb57688400359ab32682f03e"},
{file = "accelerate-1.5.1.tar.gz", hash = "sha256:5d936faf3a31894c6160f2f2a984a38aecbba760ef919ae298b2ecd57ea9bf87"},
]
[package.dependencies]
huggingface-hub = ">=0.21.0"
huggingface_hub = ">=0.21.0"
numpy = ">=1.17,<3.0.0"
packaging = ">=20.0"
psutil = "*"
@ -22,24 +22,24 @@ torch = ">=2.0.0"
[package.extras]
deepspeed = ["deepspeed"]
dev = ["bitsandbytes", "black (>=23.1,<24.0)", "datasets", "diffusers", "evaluate", "hf-doc-builder (>=0.3.0)", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-subtests", "pytest-xdist", "rich", "ruff (>=0.6.4,<0.7.0)", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
dev = ["bitsandbytes", "black (>=23.1,<24.0)", "datasets", "diffusers", "evaluate", "hf-doc-builder (>=0.3.0)", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-order", "pytest-subtests", "pytest-xdist", "rich", "ruff (>=0.6.4,<0.7.0)", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
quality = ["black (>=23.1,<24.0)", "hf-doc-builder (>=0.3.0)", "ruff (>=0.6.4,<0.7.0)"]
rich = ["rich"]
sagemaker = ["sagemaker"]
test-dev = ["bitsandbytes", "datasets", "diffusers", "evaluate", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
test-prod = ["parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-subtests", "pytest-xdist"]
test-prod = ["parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-order", "pytest-subtests", "pytest-xdist"]
test-trackers = ["comet-ml", "dvclive", "tensorboard", "wandb"]
testing = ["bitsandbytes", "datasets", "diffusers", "evaluate", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-subtests", "pytest-xdist", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
testing = ["bitsandbytes", "datasets", "diffusers", "evaluate", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-order", "pytest-subtests", "pytest-xdist", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
[[package]]
name = "aiohappyeyeballs"
version = "2.4.8"
version = "2.6.1"
description = "Happy Eyeballs for asyncio"
optional = false
python-versions = ">=3.9"
files = [
{file = "aiohappyeyeballs-2.4.8-py3-none-any.whl", hash = "sha256:6cac4f5dd6e34a9644e69cf9021ef679e4394f54e58a183056d12009e42ea9e3"},
{file = "aiohappyeyeballs-2.4.8.tar.gz", hash = "sha256:19728772cb12263077982d2f55453babd8bec6a052a926cd5c0c42796da8bf62"},
{file = "aiohappyeyeballs-2.6.1-py3-none-any.whl", hash = "sha256:f349ba8f4b75cb25c99c5c2d84e997e485204d2902a9597802b0371f09331fb8"},
{file = "aiohappyeyeballs-2.6.1.tar.gz", hash = "sha256:c3f9d0113123803ccadfdf3f0faa505bc78e6a72d1cc4806cbd719826e943558"},
]
[[package]]
@ -250,20 +250,20 @@ files = [
[[package]]
name = "attrs"
version = "25.1.0"
version = "25.3.0"
description = "Classes Without Boilerplate"
optional = false
python-versions = ">=3.8"
files = [
{file = "attrs-25.1.0-py3-none-any.whl", hash = "sha256:c75a69e28a550a7e93789579c22aa26b0f5b83b75dc4e08fe092980051e1090a"},
{file = "attrs-25.1.0.tar.gz", hash = "sha256:1c97078a80c814273a76b2a298a932eb681c87415c11dee0a6921de7f1b02c3e"},
{file = "attrs-25.3.0-py3-none-any.whl", hash = "sha256:427318ce031701fea540783410126f03899a97ffc6f61596ad581ac2e40e3bc3"},
{file = "attrs-25.3.0.tar.gz", hash = "sha256:75d7cefc7fb576747b2c81b4442d4d4a1ce0900973527c011d1030fd3bf4af1b"},
]
[package.extras]
benchmark = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-codspeed", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
cov = ["cloudpickle", "coverage[toml] (>=5.3)", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
dev = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pre-commit-uv", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
docs = ["cogapp", "furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphinxcontrib-towncrier", "towncrier (<24.7)"]
docs = ["cogapp", "furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphinxcontrib-towncrier", "towncrier"]
tests = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
tests-mypy = ["mypy (>=1.11.1)", "pytest-mypy-plugins"]
@ -787,37 +787,37 @@ vision = ["Pillow (>=9.4.0)"]
[[package]]
name = "debugpy"
version = "1.8.12"
version = "1.8.13"
description = "An implementation of the Debug Adapter Protocol for Python"
optional = false
python-versions = ">=3.8"
files = [
{file = "debugpy-1.8.12-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:a2ba7ffe58efeae5b8fad1165357edfe01464f9aef25e814e891ec690e7dd82a"},
{file = "debugpy-1.8.12-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cbbd4149c4fc5e7d508ece083e78c17442ee13b0e69bfa6bd63003e486770f45"},
{file = "debugpy-1.8.12-cp310-cp310-win32.whl", hash = "sha256:b202f591204023b3ce62ff9a47baa555dc00bb092219abf5caf0e3718ac20e7c"},
{file = "debugpy-1.8.12-cp310-cp310-win_amd64.whl", hash = "sha256:9649eced17a98ce816756ce50433b2dd85dfa7bc92ceb60579d68c053f98dff9"},
{file = "debugpy-1.8.12-cp311-cp311-macosx_14_0_universal2.whl", hash = "sha256:36f4829839ef0afdfdd208bb54f4c3d0eea86106d719811681a8627ae2e53dd5"},
{file = "debugpy-1.8.12-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a28ed481d530e3138553be60991d2d61103ce6da254e51547b79549675f539b7"},
{file = "debugpy-1.8.12-cp311-cp311-win32.whl", hash = "sha256:4ad9a94d8f5c9b954e0e3b137cc64ef3f579d0df3c3698fe9c3734ee397e4abb"},
{file = "debugpy-1.8.12-cp311-cp311-win_amd64.whl", hash = "sha256:4703575b78dd697b294f8c65588dc86874ed787b7348c65da70cfc885efdf1e1"},
{file = "debugpy-1.8.12-cp312-cp312-macosx_14_0_universal2.whl", hash = "sha256:7e94b643b19e8feb5215fa508aee531387494bf668b2eca27fa769ea11d9f498"},
{file = "debugpy-1.8.12-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:086b32e233e89a2740c1615c2f775c34ae951508b28b308681dbbb87bba97d06"},
{file = "debugpy-1.8.12-cp312-cp312-win32.whl", hash = "sha256:2ae5df899732a6051b49ea2632a9ea67f929604fd2b036613a9f12bc3163b92d"},
{file = "debugpy-1.8.12-cp312-cp312-win_amd64.whl", hash = "sha256:39dfbb6fa09f12fae32639e3286112fc35ae976114f1f3d37375f3130a820969"},
{file = "debugpy-1.8.12-cp313-cp313-macosx_14_0_universal2.whl", hash = "sha256:696d8ae4dff4cbd06bf6b10d671e088b66669f110c7c4e18a44c43cf75ce966f"},
{file = "debugpy-1.8.12-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:898fba72b81a654e74412a67c7e0a81e89723cfe2a3ea6fcd3feaa3395138ca9"},
{file = "debugpy-1.8.12-cp313-cp313-win32.whl", hash = "sha256:22a11c493c70413a01ed03f01c3c3a2fc4478fc6ee186e340487b2edcd6f4180"},
{file = "debugpy-1.8.12-cp313-cp313-win_amd64.whl", hash = "sha256:fdb3c6d342825ea10b90e43d7f20f01535a72b3a1997850c0c3cefa5c27a4a2c"},
{file = "debugpy-1.8.12-cp38-cp38-macosx_14_0_x86_64.whl", hash = "sha256:b0232cd42506d0c94f9328aaf0d1d0785f90f87ae72d9759df7e5051be039738"},
{file = "debugpy-1.8.12-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9af40506a59450f1315168d47a970db1a65aaab5df3833ac389d2899a5d63b3f"},
{file = "debugpy-1.8.12-cp38-cp38-win32.whl", hash = "sha256:5cc45235fefac57f52680902b7d197fb2f3650112379a6fa9aa1b1c1d3ed3f02"},
{file = "debugpy-1.8.12-cp38-cp38-win_amd64.whl", hash = "sha256:557cc55b51ab2f3371e238804ffc8510b6ef087673303890f57a24195d096e61"},
{file = "debugpy-1.8.12-cp39-cp39-macosx_14_0_x86_64.whl", hash = "sha256:b5c6c967d02fee30e157ab5227706f965d5c37679c687b1e7bbc5d9e7128bd41"},
{file = "debugpy-1.8.12-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:88a77f422f31f170c4b7e9ca58eae2a6c8e04da54121900651dfa8e66c29901a"},
{file = "debugpy-1.8.12-cp39-cp39-win32.whl", hash = "sha256:a4042edef80364239f5b7b5764e55fd3ffd40c32cf6753da9bda4ff0ac466018"},
{file = "debugpy-1.8.12-cp39-cp39-win_amd64.whl", hash = "sha256:f30b03b0f27608a0b26c75f0bb8a880c752c0e0b01090551b9d87c7d783e2069"},
{file = "debugpy-1.8.12-py2.py3-none-any.whl", hash = "sha256:274b6a2040349b5c9864e475284bce5bb062e63dce368a394b8cc865ae3b00c6"},
{file = "debugpy-1.8.12.tar.gz", hash = "sha256:646530b04f45c830ceae8e491ca1c9320a2d2f0efea3141487c82130aba70dce"},
{file = "debugpy-1.8.13-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:06859f68e817966723ffe046b896b1bd75c665996a77313370336ee9e1de3e90"},
{file = "debugpy-1.8.13-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cb56c2db69fb8df3168bc857d7b7d2494fed295dfdbde9a45f27b4b152f37520"},
{file = "debugpy-1.8.13-cp310-cp310-win32.whl", hash = "sha256:46abe0b821cad751fc1fb9f860fb2e68d75e2c5d360986d0136cd1db8cad4428"},
{file = "debugpy-1.8.13-cp310-cp310-win_amd64.whl", hash = "sha256:dc7b77f5d32674686a5f06955e4b18c0e41fb5a605f5b33cf225790f114cfeec"},
{file = "debugpy-1.8.13-cp311-cp311-macosx_14_0_universal2.whl", hash = "sha256:eee02b2ed52a563126c97bf04194af48f2fe1f68bb522a312b05935798e922ff"},
{file = "debugpy-1.8.13-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4caca674206e97c85c034c1efab4483f33971d4e02e73081265ecb612af65377"},
{file = "debugpy-1.8.13-cp311-cp311-win32.whl", hash = "sha256:7d9a05efc6973b5aaf076d779cf3a6bbb1199e059a17738a2aa9d27a53bcc888"},
{file = "debugpy-1.8.13-cp311-cp311-win_amd64.whl", hash = "sha256:62f9b4a861c256f37e163ada8cf5a81f4c8d5148fc17ee31fb46813bd658cdcc"},
{file = "debugpy-1.8.13-cp312-cp312-macosx_14_0_universal2.whl", hash = "sha256:2b8de94c5c78aa0d0ed79023eb27c7c56a64c68217d881bee2ffbcb13951d0c1"},
{file = "debugpy-1.8.13-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:887d54276cefbe7290a754424b077e41efa405a3e07122d8897de54709dbe522"},
{file = "debugpy-1.8.13-cp312-cp312-win32.whl", hash = "sha256:3872ce5453b17837ef47fb9f3edc25085ff998ce63543f45ba7af41e7f7d370f"},
{file = "debugpy-1.8.13-cp312-cp312-win_amd64.whl", hash = "sha256:63ca7670563c320503fea26ac688988d9d6b9c6a12abc8a8cf2e7dd8e5f6b6ea"},
{file = "debugpy-1.8.13-cp313-cp313-macosx_14_0_universal2.whl", hash = "sha256:31abc9618be4edad0b3e3a85277bc9ab51a2d9f708ead0d99ffb5bb750e18503"},
{file = "debugpy-1.8.13-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a0bd87557f97bced5513a74088af0b84982b6ccb2e254b9312e29e8a5c4270eb"},
{file = "debugpy-1.8.13-cp313-cp313-win32.whl", hash = "sha256:5268ae7fdca75f526d04465931cb0bd24577477ff50e8bb03dab90983f4ebd02"},
{file = "debugpy-1.8.13-cp313-cp313-win_amd64.whl", hash = "sha256:79ce4ed40966c4c1631d0131606b055a5a2f8e430e3f7bf8fd3744b09943e8e8"},
{file = "debugpy-1.8.13-cp38-cp38-macosx_14_0_x86_64.whl", hash = "sha256:acf39a6e98630959763f9669feddee540745dfc45ad28dbc9bd1f9cd60639391"},
{file = "debugpy-1.8.13-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:924464d87e7d905eb0d79fb70846558910e906d9ee309b60c4fe597a2e802590"},
{file = "debugpy-1.8.13-cp38-cp38-win32.whl", hash = "sha256:3dae443739c6b604802da9f3e09b0f45ddf1cf23c99161f3a1a8039f61a8bb89"},
{file = "debugpy-1.8.13-cp38-cp38-win_amd64.whl", hash = "sha256:ed93c3155fc1f888ab2b43626182174e457fc31b7781cd1845629303790b8ad1"},
{file = "debugpy-1.8.13-cp39-cp39-macosx_14_0_x86_64.whl", hash = "sha256:6fab771639332bd8ceb769aacf454a30d14d7a964f2012bf9c4e04c60f16e85b"},
{file = "debugpy-1.8.13-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:32b6857f8263a969ce2ca098f228e5cc0604d277447ec05911a8c46cf3e7e307"},
{file = "debugpy-1.8.13-cp39-cp39-win32.whl", hash = "sha256:f14d2c4efa1809da125ca62df41050d9c7cd9cb9e380a2685d1e453c4d450ccb"},
{file = "debugpy-1.8.13-cp39-cp39-win_amd64.whl", hash = "sha256:ea869fe405880327497e6945c09365922c79d2a1eed4c3ae04d77ac7ae34b2b5"},
{file = "debugpy-1.8.13-py2.py3-none-any.whl", hash = "sha256:d4ba115cdd0e3a70942bd562adba9ec8c651fe69ddde2298a1be296fc331906f"},
{file = "debugpy-1.8.13.tar.gz", hash = "sha256:837e7bef95bdefba426ae38b9a94821ebdc5bea55627879cd48165c90b9e50ce"},
]
[[package]]
@ -870,13 +870,13 @@ files = [
[[package]]
name = "docling-core"
version = "2.22.0"
version = "2.23.3"
description = "A python library to define and validate data types in Docling."
optional = false
python-versions = "<4.0,>=3.9"
files = [
{file = "docling_core-2.22.0-py3-none-any.whl", hash = "sha256:d74d351024d016f46a09f171fb9d2d78809b132e18e25176af517ac4203c858c"},
{file = "docling_core-2.22.0.tar.gz", hash = "sha256:5e4bf15884560a5dc66482206f875d152701bb809f0ed52bbbe86133e0d559e2"},
{file = "docling_core-2.23.3-py3-none-any.whl", hash = "sha256:a2166ffc41f8fdf6fdb99b33da6c7146eccf6382712ea92e95772604fb5af5e5"},
{file = "docling_core-2.23.3.tar.gz", hash = "sha256:a64ce41e0881c06962a2b3ec80e0665f84de0809dedf1bf84f3a14b75dd665c4"},
]
[package.dependencies]
@ -929,43 +929,43 @@ transformers = [
[[package]]
name = "docling-parse"
version = "3.4.0"
version = "4.0.0"
description = "Simple package to extract text with coordinates from programmatic PDFs"
optional = false
python-versions = "<4.0,>=3.9"
files = [
{file = "docling_parse-3.4.0-cp310-cp310-macosx_13_0_x86_64.whl", hash = "sha256:96e95e63ab722dfe5340fcb04d0e07bd1c0a0ba2f62e93c91ac26dda0a312a44"},
{file = "docling_parse-3.4.0-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:f9e14a7a0b92526d4dfd3f390f3d7e075f59d14d6b8a0a564fbc26299e56cd47"},
{file = "docling_parse-3.4.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fdef1d51291e841e5b6a32689a39a9f35986389f863b415eaa1790b29d021101"},
{file = "docling_parse-3.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:68652610d6c34adc684dbaa77b5d596b25d004912a78e85ec4ae57910bf7086f"},
{file = "docling_parse-3.4.0-cp310-cp310-win_amd64.whl", hash = "sha256:daad07fe93f306d8e2378acb24ef2fa68535ccdb960a1b99d6b36ab8c299fef1"},
{file = "docling_parse-3.4.0-cp311-cp311-macosx_13_0_x86_64.whl", hash = "sha256:6f30c5fd3c04bd3d1a7d06baeae2e5c3adbebc284071a9a52b0150bcd4917a3d"},
{file = "docling_parse-3.4.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:2c3664e4c8980dc44e0d026b1b01fbc94f0dac9adf7be835071d4a761977c36d"},
{file = "docling_parse-3.4.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3febf7515453d18df03c275356db2bb5b0618ba9fc033aba05d58318a9846b1a"},
{file = "docling_parse-3.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:75aeb038bb7f6400ecde99cf6c4ef35867c528ac21676071a822ed72d0653149"},
{file = "docling_parse-3.4.0-cp311-cp311-win_amd64.whl", hash = "sha256:8d20e3584022542448c21ed0ac868b2457ae35211cea63ed20142e375549e633"},
{file = "docling_parse-3.4.0-cp312-cp312-macosx_13_0_x86_64.whl", hash = "sha256:ddfe2bd730ed08363f25954a0480da021e6e6bdb175276643cc2913a6bbd98e2"},
{file = "docling_parse-3.4.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:faf8ba9eaab8c17ea72516be5d440f754fcca27f37488dcf126a0f3ac3a63058"},
{file = "docling_parse-3.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9eb5e7e50b3057690d0d4fa651363cafd7735bb952378dd8a4ca6c7d359507db"},
{file = "docling_parse-3.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:452334b387e2c699f69acf37a4ea4ae7097d062a2dd1980c573b73051c031158"},
{file = "docling_parse-3.4.0-cp312-cp312-win_amd64.whl", hash = "sha256:1ba00147ccb0a1dc10cdf58645e67f4ee895c6920bc583bc6f25d27cd562bfed"},
{file = "docling_parse-3.4.0-cp313-cp313-macosx_13_0_x86_64.whl", hash = "sha256:2b22a33a2d2f3616a7ac0f4b2f2ba6099f8a5dc6fa328be0f17c9c506455d7c1"},
{file = "docling_parse-3.4.0-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:0dd2440a94d555f98b702e88bfe7cc5a585d9191f4ea93884b02e286e7af3a06"},
{file = "docling_parse-3.4.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5f5828744a0e33136e09e8c61ca0b2c0ead8f76595f2e0955beaac16adce51f5"},
{file = "docling_parse-3.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:26fff6e36809d17ff855532f985df3738ada8d86a9fc746049ea6e6524d5e0a2"},
{file = "docling_parse-3.4.0-cp313-cp313-win_amd64.whl", hash = "sha256:13fc442f64171280db98dc4507274ffa0a65bac94eecbcc60c3cbf41f433b556"},
{file = "docling_parse-3.4.0-cp39-cp39-macosx_13_0_x86_64.whl", hash = "sha256:16d570ab655ea5a25d9cd1e27bc4d6905372784907d679cde4cef2fb22df61c7"},
{file = "docling_parse-3.4.0-cp39-cp39-macosx_14_0_arm64.whl", hash = "sha256:05bd405635be2379ef6cb0c7c39dc08edf3ba93788eb0fca7426b2218538bce1"},
{file = "docling_parse-3.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f6c92f0353bbae7ca9b39553cc4d03f5fefdab33ecd26809ab710cc752fac03c"},
{file = "docling_parse-3.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8e883326ec4121891c48d365d064e5ae30c5b90a2dac44ed61ac02e7da41345d"},
{file = "docling_parse-3.4.0-cp39-cp39-win_amd64.whl", hash = "sha256:b2a0fe1e1d88c3814553137daa597ee34dc310f50fe415e1f8a1c6e611d95e42"},
{file = "docling_parse-3.4.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:930f5a5d78404de573c0ba302d313b6647f1e86714766e5a1cdc09af014ca111"},
{file = "docling_parse-3.4.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:328fd72f274b939d454e3ff20a73074d99664cb4a51e6ccdaf195a6626691b95"},
{file = "docling_parse-3.4.0.tar.gz", hash = "sha256:36cdd17bcc4a833b5c9af9ae3dc461ed18a975c1b084ccfd19a9d9cde4f66e14"},
{file = "docling_parse-4.0.0-cp310-cp310-macosx_13_0_x86_64.whl", hash = "sha256:6de7fa8ec4919f604c9a02a3fa8ca0e13a3a8e3c0652adc41848616b737925d9"},
{file = "docling_parse-4.0.0-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:82704280ab086a84a30d9ec9def6cd96b733aefc6973546b2101d09eed7a958e"},
{file = "docling_parse-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f51ec645978d75e7cf232fa7c571ebf164a5bdf418588c663f9b3c062df6ba72"},
{file = "docling_parse-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5d5da855f35303f9229198891da550e3c1e1f4025e52ab8c0303d345669ff46f"},
{file = "docling_parse-4.0.0-cp310-cp310-win_amd64.whl", hash = "sha256:ba36cb329aadb306cc25901305d49fe6d2ed9e93e9dc993b4baf13fcc90a98e1"},
{file = "docling_parse-4.0.0-cp311-cp311-macosx_13_0_x86_64.whl", hash = "sha256:9b7afbf09945b4d9e3ddb9c24a13d7b9f987cf32d5c9d68532ceb63fb26697df"},
{file = "docling_parse-4.0.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:6daaec89c5045e968785a225b9b5a42b36dfe6b5a4437995e2d34e1595e2c162"},
{file = "docling_parse-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:93e638ef2ad36e9e4a8ef881073696467e6699bf206e5a416de4abaaf531b0e1"},
{file = "docling_parse-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:87246eb0d259202a7f093336f17235cb1fffb67e82b41dbc0e88f9c05b08014e"},
{file = "docling_parse-4.0.0-cp311-cp311-win_amd64.whl", hash = "sha256:0ae44b913b010994c3e36869e5fc9dad252a7dc7434225790928075c8b5a7f6c"},
{file = "docling_parse-4.0.0-cp312-cp312-macosx_13_0_x86_64.whl", hash = "sha256:ed6d8ac29c1014ed7a126d782b6bc963c9a9c09f41224fa90f9a8b45bf3191f9"},
{file = "docling_parse-4.0.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:4a2dd46cee8e54f3aa511dbf552ef5f9f422944c54de73888ee55b2c4a6e10b9"},
{file = "docling_parse-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:722fbd63f7f28e8a49fa2cd92d1571290f6c5295b86c7406b7c20a6c6e8b3975"},
{file = "docling_parse-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cc155767b51a23f5bfd5abaabaf8c4a57777aa0277c813e13b9f6c43532964bd"},
{file = "docling_parse-4.0.0-cp312-cp312-win_amd64.whl", hash = "sha256:e45ab31fffe4ae571bd2ecc9e0a9d5665a1486463396924160add84828d2a7e7"},
{file = "docling_parse-4.0.0-cp313-cp313-macosx_13_0_x86_64.whl", hash = "sha256:d93fd3cec032e5b7f6385f7a021e228c52eb381f28fc037224708aeaad487d8b"},
{file = "docling_parse-4.0.0-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:d9f64847cd7e9a7a34a3d5a14f0827022ed3b7f50f39d5126ef003c55d574ba3"},
{file = "docling_parse-4.0.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a6ac283f08680dfde568b5629ab94830cab32795d74086553e755460b6879901"},
{file = "docling_parse-4.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:97eca28220dc5075099e01f2cb7a3e9005b9951dee0ca0eb743e298be7284279"},
{file = "docling_parse-4.0.0-cp313-cp313-win_amd64.whl", hash = "sha256:6019288cfe25a97993c2aab453386fc3e366d7761637e682b25915ba2c856cc4"},
{file = "docling_parse-4.0.0-cp39-cp39-macosx_13_0_x86_64.whl", hash = "sha256:168c861233fc2a1e4b7d934aa6f7e1b3f568434fd478f18f0b3bcc09880d504c"},
{file = "docling_parse-4.0.0-cp39-cp39-macosx_14_0_arm64.whl", hash = "sha256:b1cc0b7a214bc9e4e05c65572c4a17c19d0f4f0795fe1fa77a0ad499ab7e4e79"},
{file = "docling_parse-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cec16060eba37db3fa2ff0b34d283cf33384caecc73b0d8dbf012e3b3941c21d"},
{file = "docling_parse-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6a058c2330d7759d943ae50db9e4ecab60201a54116052f94e6e7a3886886b65"},
{file = "docling_parse-4.0.0-cp39-cp39-win_amd64.whl", hash = "sha256:05af04972fef73f2e10cc46c8f541aaf6713fdcad254502a0012884109c1d468"},
{file = "docling_parse-4.0.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:30c0c1b33c0a0aeb6897537f7d8fa09ed5a26f05685b18a2d27c73a789343679"},
{file = "docling_parse-4.0.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:2dff48f5fef106539137a4e63ee58b5be0e7a81ac1aedd61a4453c268b8f76d1"},
{file = "docling_parse-4.0.0.tar.gz", hash = "sha256:5be0ba4e0098524f116743e6b709f29fe273e441e61923c3a262e054643c5ee6"},
]
[package.dependencies]
docling-core = ">=2.14.0,<3.0.0"
docling-core = ">=2.23.0,<3.0.0"
pillow = ">=10.0.0,<12.0.0"
pydantic = ">=2.0.0,<3.0.0"
pywin32 = {version = ">=305", markers = "sys_platform == \"win32\""}
@ -1086,13 +1086,13 @@ devel = ["colorama", "json-spec", "jsonschema", "pylint", "pytest", "pytest-benc
[[package]]
name = "filelock"
version = "3.17.0"
version = "3.18.0"
description = "A platform independent file lock."
optional = false
python-versions = ">=3.9"
files = [
{file = "filelock-3.17.0-py3-none-any.whl", hash = "sha256:533dc2f7ba78dc2f0f531fc6c4940addf7b70a481e269a5a3b93be94ffbe8338"},
{file = "filelock-3.17.0.tar.gz", hash = "sha256:ee4e77401ef576ebb38cd7f13b9b28893194acc20a8e68e18730ba9c0e54660e"},
{file = "filelock-3.18.0-py3-none-any.whl", hash = "sha256:c401f4f8377c4464e6db25fff06205fd89bdd83b65eb0488ed1b160f780e21de"},
{file = "filelock-3.18.0.tar.gz", hash = "sha256:adbc88eabb99d2fec8c9c1b229b171f18afa655400173ddc653d5d01501fb9f2"},
]
[package.extras]
@ -1500,13 +1500,13 @@ zstd = ["zstandard (>=0.18.0)"]
[[package]]
name = "huggingface-hub"
version = "0.29.1"
version = "0.29.3"
description = "Client library to download and publish models, datasets and other repos on the huggingface.co hub"
optional = false
python-versions = ">=3.8.0"
files = [
{file = "huggingface_hub-0.29.1-py3-none-any.whl", hash = "sha256:352f69caf16566c7b6de84b54a822f6238e17ddd8ae3da4f8f2272aea5b198d5"},
{file = "huggingface_hub-0.29.1.tar.gz", hash = "sha256:9524eae42077b8ff4fc459ceb7a514eca1c1232b775276b009709fe2a084f250"},
{file = "huggingface_hub-0.29.3-py3-none-any.whl", hash = "sha256:0b25710932ac649c08cdbefa6c6ccb8e88eef82927cacdb048efb726429453aa"},
{file = "huggingface_hub-0.29.3.tar.gz", hash = "sha256:64519a25716e0ba382ba2d3fb3ca082e7c7eb4a2fc634d200e8380006e0760e5"},
]
[package.dependencies]
@ -1548,13 +1548,13 @@ pyreadline3 = {version = "*", markers = "sys_platform == \"win32\" and python_ve
[[package]]
name = "identify"
version = "2.6.8"
version = "2.6.9"
description = "File identification library for Python"
optional = false
python-versions = ">=3.9"
files = [
{file = "identify-2.6.8-py2.py3-none-any.whl", hash = "sha256:83657f0f766a3c8d0eaea16d4ef42494b39b34629a4b3192a9d020d349b3e255"},
{file = "identify-2.6.8.tar.gz", hash = "sha256:61491417ea2c0c5c670484fd8abbb34de34cdae1e5f39a73ee65e48e4bb663fc"},
{file = "identify-2.6.9-py2.py3-none-any.whl", hash = "sha256:c98b4322da415a8e5a70ff6e51fbc2d2932c015532d77e9f8537b4ba7813b150"},
{file = "identify-2.6.9.tar.gz", hash = "sha256:d40dfe3142a1421d8518e3d3985ef5ac42890683e32306ad614a29490abeb6bf"},
]
[package.extras]
@ -1851,13 +1851,13 @@ trio = ["trio"]
[[package]]
name = "jinja2"
version = "3.1.5"
version = "3.1.6"
description = "A very fast and expressive template engine."
optional = false
python-versions = ">=3.7"
files = [
{file = "jinja2-3.1.5-py3-none-any.whl", hash = "sha256:aba0f4dc9ed8013c424088f68a5c226f7d6097ed89b246d7749c2ec4175c6adb"},
{file = "jinja2-3.1.5.tar.gz", hash = "sha256:8fefff8dc3034e27bb80d67c671eb8a9bc424c0ef4c0826edbff304cceff43bb"},
{file = "jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67"},
{file = "jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d"},
]
[package.dependencies]
@ -2666,13 +2666,13 @@ min-versions = ["babel (==2.9.0)", "click (==7.0)", "colorama (==0.4)", "ghp-imp
[[package]]
name = "mkdocs-autorefs"
version = "1.4.0"
version = "1.4.1"
description = "Automatically link across pages in MkDocs."
optional = false
python-versions = ">=3.9"
files = [
{file = "mkdocs_autorefs-1.4.0-py3-none-any.whl", hash = "sha256:bad19f69655878d20194acd0162e29a89c3f7e6365ffe54e72aa3fd1072f240d"},
{file = "mkdocs_autorefs-1.4.0.tar.gz", hash = "sha256:a9c0aa9c90edbce302c09d050a3c4cb7c76f8b7b2c98f84a7a05f53d00392156"},
{file = "mkdocs_autorefs-1.4.1-py3-none-any.whl", hash = "sha256:9793c5ac06a6ebbe52ec0f8439256e66187badf4b5334b5fde0b128ec134df4f"},
{file = "mkdocs_autorefs-1.4.1.tar.gz", hash = "sha256:4b5b6235a4becb2b10425c2fa191737e415b37aa3418919db33e5d774c9db079"},
]
[package.dependencies]
@ -2733,13 +2733,13 @@ pygments = ">2.12.0"
[[package]]
name = "mkdocs-material"
version = "9.6.7"
version = "9.6.8"
description = "Documentation that simply works"
optional = false
python-versions = ">=3.8"
files = [
{file = "mkdocs_material-9.6.7-py3-none-any.whl", hash = "sha256:8a159e45e80fcaadd9fbeef62cbf928569b93df954d4dc5ba76d46820caf7b47"},
{file = "mkdocs_material-9.6.7.tar.gz", hash = "sha256:3e2c1fceb9410056c2d91f334a00cdea3215c28750e00c691c1e46b2a33309b4"},
{file = "mkdocs_material-9.6.8-py3-none-any.whl", hash = "sha256:0a51532dd8aa80b232546c073fe3ef60dfaef1b1b12196ac7191ee01702d1cf8"},
{file = "mkdocs_material-9.6.8.tar.gz", hash = "sha256:8de31bb7566379802532b248bd56d9c4bc834afc4625884bf5769f9412c6a354"},
]
[package.dependencies]
@ -3703,14 +3703,14 @@ files = [
[[package]]
name = "nvidia-nvjitlink-cu12"
version = "12.8.61"
version = "12.8.93"
description = "Nvidia JIT LTO Library"
optional = false
python-versions = ">=3"
files = [
{file = "nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:45fd79f2ae20bd67e8bc411055939049873bfd8fac70ff13bd4865e0b9bdab17"},
{file = "nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:9b80ecab31085dda3ce3b41d043be0ec739216c3fc633b8abe212d5a30026df0"},
{file = "nvidia_nvjitlink_cu12-12.8.61-py3-none-win_amd64.whl", hash = "sha256:1166a964d25fdc0eae497574d38824305195a5283324a21ccb0ce0c802cbf41c"},
{file = "nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:81ff63371a7ebd6e6451970684f916be2eab07321b73c9d244dc2b4da7f73b88"},
{file = "nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:adccd7161ace7261e01bb91e44e88da350895c270d23f744f0820c818b7229e7"},
{file = "nvidia_nvjitlink_cu12-12.8.93-py3-none-win_amd64.whl", hash = "sha256:bd93fbeeee850917903583587f4fc3a4eafa022e34572251368238ab5e6bd67f"},
]
[[package]]
@ -3796,32 +3796,29 @@ sympy = "*"
[[package]]
name = "onnxruntime"
version = "1.20.1"
version = "1.21.0"
description = "ONNX Runtime is a runtime accelerator for Machine Learning models"
optional = true
python-versions = "*"
python-versions = ">=3.10"
files = [
{file = "onnxruntime-1.20.1-cp310-cp310-macosx_13_0_universal2.whl", hash = "sha256:e50ba5ff7fed4f7d9253a6baf801ca2883cc08491f9d32d78a80da57256a5439"},
{file = "onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7b2908b50101a19e99c4d4e97ebb9905561daf61829403061c1adc1b588bc0de"},
{file = "onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d82daaec24045a2e87598b8ac2b417b1cce623244e80e663882e9fe1aae86410"},
{file = "onnxruntime-1.20.1-cp310-cp310-win32.whl", hash = "sha256:4c4b251a725a3b8cf2aab284f7d940c26094ecd9d442f07dd81ab5470e99b83f"},
{file = "onnxruntime-1.20.1-cp310-cp310-win_amd64.whl", hash = "sha256:d3b616bb53a77a9463707bb313637223380fc327f5064c9a782e8ec69c22e6a2"},
{file = "onnxruntime-1.20.1-cp311-cp311-macosx_13_0_universal2.whl", hash = "sha256:06bfbf02ca9ab5f28946e0f912a562a5f005301d0c419283dc57b3ed7969bb7b"},
{file = "onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f6243e34d74423bdd1edf0ae9596dd61023b260f546ee17d701723915f06a9f7"},
{file = "onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5eec64c0269dcdb8d9a9a53dc4d64f87b9e0c19801d9321246a53b7eb5a7d1bc"},
{file = "onnxruntime-1.20.1-cp311-cp311-win32.whl", hash = "sha256:a19bc6e8c70e2485a1725b3d517a2319603acc14c1f1a017dda0afe6d4665b41"},
{file = "onnxruntime-1.20.1-cp311-cp311-win_amd64.whl", hash = "sha256:8508887eb1c5f9537a4071768723ec7c30c28eb2518a00d0adcd32c89dea3221"},
{file = "onnxruntime-1.20.1-cp312-cp312-macosx_13_0_universal2.whl", hash = "sha256:22b0655e2bf4f2161d52706e31f517a0e54939dc393e92577df51808a7edc8c9"},
{file = "onnxruntime-1.20.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f1f56e898815963d6dc4ee1c35fc6c36506466eff6d16f3cb9848cea4e8c8172"},
{file = "onnxruntime-1.20.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bb71a814f66517a65628c9e4a2bb530a6edd2cd5d87ffa0af0f6f773a027d99e"},
{file = "onnxruntime-1.20.1-cp312-cp312-win32.whl", hash = "sha256:bd386cc9ee5f686ee8a75ba74037750aca55183085bf1941da8efcfe12d5b120"},
{file = "onnxruntime-1.20.1-cp312-cp312-win_amd64.whl", hash = "sha256:19c2d843eb074f385e8bbb753a40df780511061a63f9def1b216bf53860223fb"},
{file = "onnxruntime-1.20.1-cp313-cp313-macosx_13_0_universal2.whl", hash = "sha256:cc01437a32d0042b606f462245c8bbae269e5442797f6213e36ce61d5abdd8cc"},
{file = "onnxruntime-1.20.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fb44b08e017a648924dbe91b82d89b0c105b1adcfe31e90d1dc06b8677ad37be"},
{file = "onnxruntime-1.20.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bda6aebdf7917c1d811f21d41633df00c58aff2bef2f598f69289c1f1dabc4b3"},
{file = "onnxruntime-1.20.1-cp313-cp313-win_amd64.whl", hash = "sha256:d30367df7e70f1d9fc5a6a68106f5961686d39b54d3221f760085524e8d38e16"},
{file = "onnxruntime-1.20.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c9158465745423b2b5d97ed25aa7740c7d38d2993ee2e5c3bfacb0c4145c49d8"},
{file = "onnxruntime-1.20.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0df6f2df83d61f46e842dbcde610ede27218947c33e994545a22333491e72a3b"},
{file = "onnxruntime-1.21.0-cp310-cp310-macosx_13_0_universal2.whl", hash = "sha256:95513c9302bc8dd013d84148dcf3168e782a80cdbf1654eddc948a23147ccd3d"},
{file = "onnxruntime-1.21.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:635d4ab13ae0f150dd4c6ff8206fd58f1c6600636ecc796f6f0c42e4c918585b"},
{file = "onnxruntime-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7d06bfa0dd5512bd164f25a2bf594b2e7c9eabda6fc064b684924f3e81bdab1b"},
{file = "onnxruntime-1.21.0-cp310-cp310-win_amd64.whl", hash = "sha256:b0fc22d219791e0284ee1d9c26724b8ee3fbdea28128ef25d9507ad3b9621f23"},
{file = "onnxruntime-1.21.0-cp311-cp311-macosx_13_0_universal2.whl", hash = "sha256:8e16f8a79df03919810852fb46ffcc916dc87a9e9c6540a58f20c914c575678c"},
{file = "onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7f9156cf6f8ee133d07a751e6518cf6f84ed37fbf8243156bd4a2c4ee6e073c8"},
{file = "onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8a5d09815a9e209fa0cb20c2985b34ab4daeba7aea94d0f96b8751eb10403201"},
{file = "onnxruntime-1.21.0-cp311-cp311-win_amd64.whl", hash = "sha256:1d970dff1e2fa4d9c53f2787b3b7d0005596866e6a31997b41169017d1362dd0"},
{file = "onnxruntime-1.21.0-cp312-cp312-macosx_13_0_universal2.whl", hash = "sha256:893d67c68ca9e7a58202fa8d96061ed86a5815b0925b5a97aef27b8ba246a20b"},
{file = "onnxruntime-1.21.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:37b7445c920a96271a8dfa16855e258dc5599235b41c7bbde0d262d55bcc105f"},
{file = "onnxruntime-1.21.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9a04aafb802c1e5573ba4552f8babcb5021b041eb4cfa802c9b7644ca3510eca"},
{file = "onnxruntime-1.21.0-cp312-cp312-win_amd64.whl", hash = "sha256:7f801318476cd7003d636a5b392f7a37c08b6c8d2f829773f3c3887029e03f32"},
{file = "onnxruntime-1.21.0-cp313-cp313-macosx_13_0_universal2.whl", hash = "sha256:85718cbde1c2912d3a03e3b3dc181b1480258a229c32378408cace7c450f7f23"},
{file = "onnxruntime-1.21.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:94dff3a61538f3b7b0ea9a06bc99e1410e90509c76e3a746f039e417802a12ae"},
{file = "onnxruntime-1.21.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c1e704b0eda5f2bbbe84182437315eaec89a450b08854b5a7762c85d04a28a0a"},
{file = "onnxruntime-1.21.0-cp313-cp313-win_amd64.whl", hash = "sha256:19b630c6a8956ef97fb7c94948b17691167aa1aaf07b5f214fa66c3e4136c108"},
{file = "onnxruntime-1.21.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3995c4a2d81719623c58697b9510f8de9fa42a1da6b4474052797b0d712324fe"},
{file = "onnxruntime-1.21.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:36b18b8f39c0f84e783902112a0dd3c102466897f96d73bb83f6a6bff283a423"},
]
[package.dependencies]
@ -4438,22 +4435,20 @@ files = [
[[package]]
name = "protobuf"
version = "5.29.3"
version = "6.30.1"
description = ""
optional = false
python-versions = ">=3.8"
python-versions = ">=3.9"
files = [
{file = "protobuf-5.29.3-cp310-abi3-win32.whl", hash = "sha256:3ea51771449e1035f26069c4c7fd51fba990d07bc55ba80701c78f886bf9c888"},
{file = "protobuf-5.29.3-cp310-abi3-win_amd64.whl", hash = "sha256:a4fa6f80816a9a0678429e84973f2f98cbc218cca434abe8db2ad0bffc98503a"},
{file = "protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:a8434404bbf139aa9e1300dbf989667a83d42ddda9153d8ab76e0d5dcaca484e"},
{file = "protobuf-5.29.3-cp38-abi3-manylinux2014_aarch64.whl", hash = "sha256:daaf63f70f25e8689c072cfad4334ca0ac1d1e05a92fc15c54eb9cf23c3efd84"},
{file = "protobuf-5.29.3-cp38-abi3-manylinux2014_x86_64.whl", hash = "sha256:c027e08a08be10b67c06bf2370b99c811c466398c357e615ca88c91c07f0910f"},
{file = "protobuf-5.29.3-cp38-cp38-win32.whl", hash = "sha256:84a57163a0ccef3f96e4b6a20516cedcf5bb3a95a657131c5c3ac62200d23252"},
{file = "protobuf-5.29.3-cp38-cp38-win_amd64.whl", hash = "sha256:b89c115d877892a512f79a8114564fb435943b59067615894c3b13cd3e1fa107"},
{file = "protobuf-5.29.3-cp39-cp39-win32.whl", hash = "sha256:0eb32bfa5219fc8d4111803e9a690658aa2e6366384fd0851064b963b6d1f2a7"},
{file = "protobuf-5.29.3-cp39-cp39-win_amd64.whl", hash = "sha256:6ce8cc3389a20693bfde6c6562e03474c40851b44975c9b2bf6df7d8c4f864da"},
{file = "protobuf-5.29.3-py3-none-any.whl", hash = "sha256:0a18ed4a24198528f2333802eb075e59dea9d679ab7a6c5efb017a59004d849f"},
{file = "protobuf-5.29.3.tar.gz", hash = "sha256:5da0f41edaf117bde316404bad1a486cb4ededf8e4a54891296f648e8e076620"},
{file = "protobuf-6.30.1-cp310-abi3-win32.whl", hash = "sha256:ba0706f948d0195f5cac504da156d88174e03218d9364ab40d903788c1903d7e"},
{file = "protobuf-6.30.1-cp310-abi3-win_amd64.whl", hash = "sha256:ed484f9ddd47f0f1bf0648806cccdb4fe2fb6b19820f9b79a5adf5dcfd1b8c5f"},
{file = "protobuf-6.30.1-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:aa4f7dfaed0d840b03d08d14bfdb41348feaee06a828a8c455698234135b4075"},
{file = "protobuf-6.30.1-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:47cd320b7db63e8c9ac35f5596ea1c1e61491d8a8eb6d8b45edc44760b53a4f6"},
{file = "protobuf-6.30.1-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:e3083660225fa94748ac2e407f09a899e6a28bf9c0e70c75def8d15706bf85fc"},
{file = "protobuf-6.30.1-cp39-cp39-win32.whl", hash = "sha256:554d7e61cce2aa4c63ca27328f757a9f3867bce8ec213bf09096a8d16bcdcb6a"},
{file = "protobuf-6.30.1-cp39-cp39-win_amd64.whl", hash = "sha256:b510f55ce60f84dc7febc619b47215b900466e3555ab8cb1ba42deb4496d6cc0"},
{file = "protobuf-6.30.1-py3-none-any.whl", hash = "sha256:3c25e51e1359f1f5fa3b298faa6016e650d148f214db2e47671131b9063c53be"},
{file = "protobuf-6.30.1.tar.gz", hash = "sha256:535fb4e44d0236893d5cf1263a0f706f1160b689a7ab962e9da8a9ce4050b780"},
]
[[package]]
@ -4875,13 +4870,13 @@ extra = ["pygments (>=2.19.1)"]
[[package]]
name = "pymilvus"
version = "2.5.4"
version = "2.5.5"
description = "Python Sdk for Milvus"
optional = false
python-versions = ">=3.8"
files = [
{file = "pymilvus-2.5.4-py3-none-any.whl", hash = "sha256:3f7ddaeae0c8f63554b8e316b73f265d022e05a457d47c366ce47293434a3aea"},
{file = "pymilvus-2.5.4.tar.gz", hash = "sha256:611732428ff669d57ded3d1f823bdeb10febf233d0251cce8498b287e5a10ce8"},
{file = "pymilvus-2.5.5-py3-none-any.whl", hash = "sha256:b91794fbaf72c6d7ed2419b8d4e67369263bdc16f1722f02c97927cfdf3e69da"},
{file = "pymilvus-2.5.5.tar.gz", hash = "sha256:8985f018961853022e03639a9ff323d5c22d0b659e66e288f4d08de11789e1d4"},
]
[package.dependencies]
@ -4896,7 +4891,7 @@ ujson = ">=2.0.0"
[package.extras]
bulk-writer = ["azure-storage-blob", "minio (>=7.0.0)", "pyarrow (>=12.0.0)", "requests"]
dev = ["black", "grpcio (==1.62.2)", "grpcio-testing (==1.62.2)", "grpcio-tools (==1.62.2)", "pytest (>=5.3.4)", "pytest-cov (>=2.8.1)", "pytest-timeout (>=1.3.4)", "ruff (>0.4.0)"]
model = ["milvus-model (>=0.1.0)"]
model = ["pymilvus.model (>=0.3.0)"]
[[package]]
name = "pyobjc-core"
@ -5332,29 +5327,27 @@ files = [
[[package]]
name = "pywin32"
version = "308"
version = "309"
description = "Python for Window Extensions"
optional = false
python-versions = "*"
files = [
{file = "pywin32-308-cp310-cp310-win32.whl", hash = "sha256:796ff4426437896550d2981b9c2ac0ffd75238ad9ea2d3bfa67a1abd546d262e"},
{file = "pywin32-308-cp310-cp310-win_amd64.whl", hash = "sha256:4fc888c59b3c0bef905ce7eb7e2106a07712015ea1c8234b703a088d46110e8e"},
{file = "pywin32-308-cp310-cp310-win_arm64.whl", hash = "sha256:a5ab5381813b40f264fa3495b98af850098f814a25a63589a8e9eb12560f450c"},
{file = "pywin32-308-cp311-cp311-win32.whl", hash = "sha256:5d8c8015b24a7d6855b1550d8e660d8daa09983c80e5daf89a273e5c6fb5095a"},
{file = "pywin32-308-cp311-cp311-win_amd64.whl", hash = "sha256:575621b90f0dc2695fec346b2d6302faebd4f0f45c05ea29404cefe35d89442b"},
{file = "pywin32-308-cp311-cp311-win_arm64.whl", hash = "sha256:100a5442b7332070983c4cd03f2e906a5648a5104b8a7f50175f7906efd16bb6"},
{file = "pywin32-308-cp312-cp312-win32.whl", hash = "sha256:587f3e19696f4bf96fde9d8a57cec74a57021ad5f204c9e627e15c33ff568897"},
{file = "pywin32-308-cp312-cp312-win_amd64.whl", hash = "sha256:00b3e11ef09ede56c6a43c71f2d31857cf7c54b0ab6e78ac659497abd2834f47"},
{file = "pywin32-308-cp312-cp312-win_arm64.whl", hash = "sha256:9b4de86c8d909aed15b7011182c8cab38c8850de36e6afb1f0db22b8959e3091"},
{file = "pywin32-308-cp313-cp313-win32.whl", hash = "sha256:1c44539a37a5b7b21d02ab34e6a4d314e0788f1690d65b48e9b0b89f31abbbed"},
{file = "pywin32-308-cp313-cp313-win_amd64.whl", hash = "sha256:fd380990e792eaf6827fcb7e187b2b4b1cede0585e3d0c9e84201ec27b9905e4"},
{file = "pywin32-308-cp313-cp313-win_arm64.whl", hash = "sha256:ef313c46d4c18dfb82a2431e3051ac8f112ccee1a34f29c263c583c568db63cd"},
{file = "pywin32-308-cp37-cp37m-win32.whl", hash = "sha256:1f696ab352a2ddd63bd07430080dd598e6369152ea13a25ebcdd2f503a38f1ff"},
{file = "pywin32-308-cp37-cp37m-win_amd64.whl", hash = "sha256:13dcb914ed4347019fbec6697a01a0aec61019c1046c2b905410d197856326a6"},
{file = "pywin32-308-cp38-cp38-win32.whl", hash = "sha256:5794e764ebcabf4ff08c555b31bd348c9025929371763b2183172ff4708152f0"},
{file = "pywin32-308-cp38-cp38-win_amd64.whl", hash = "sha256:3b92622e29d651c6b783e368ba7d6722b1634b8e70bd376fd7610fe1992e19de"},
{file = "pywin32-308-cp39-cp39-win32.whl", hash = "sha256:7873ca4dc60ab3287919881a7d4f88baee4a6e639aa6962de25a98ba6b193341"},
{file = "pywin32-308-cp39-cp39-win_amd64.whl", hash = "sha256:71b3322d949b4cc20776436a9c9ba0eeedcbc9c650daa536df63f0ff111bb920"},
{file = "pywin32-309-cp310-cp310-win32.whl", hash = "sha256:5b78d98550ca093a6fe7ab6d71733fbc886e2af9d4876d935e7f6e1cd6577ac9"},
{file = "pywin32-309-cp310-cp310-win_amd64.whl", hash = "sha256:728d08046f3d65b90d4c77f71b6fbb551699e2005cc31bbffd1febd6a08aa698"},
{file = "pywin32-309-cp310-cp310-win_arm64.whl", hash = "sha256:c667bcc0a1e6acaca8984eb3e2b6e42696fc035015f99ff8bc6c3db4c09a466a"},
{file = "pywin32-309-cp311-cp311-win32.whl", hash = "sha256:d5df6faa32b868baf9ade7c9b25337fa5eced28eb1ab89082c8dae9c48e4cd51"},
{file = "pywin32-309-cp311-cp311-win_amd64.whl", hash = "sha256:e7ec2cef6df0926f8a89fd64959eba591a1eeaf0258082065f7bdbe2121228db"},
{file = "pywin32-309-cp311-cp311-win_arm64.whl", hash = "sha256:54ee296f6d11db1627216e9b4d4c3231856ed2d9f194c82f26c6cb5650163f4c"},
{file = "pywin32-309-cp312-cp312-win32.whl", hash = "sha256:de9acacced5fa82f557298b1fed5fef7bd49beee04190f68e1e4783fbdc19926"},
{file = "pywin32-309-cp312-cp312-win_amd64.whl", hash = "sha256:6ff9eebb77ffc3d59812c68db33c0a7817e1337e3537859499bd27586330fc9e"},
{file = "pywin32-309-cp312-cp312-win_arm64.whl", hash = "sha256:619f3e0a327b5418d833f44dc87859523635cf339f86071cc65a13c07be3110f"},
{file = "pywin32-309-cp313-cp313-win32.whl", hash = "sha256:008bffd4afd6de8ca46c6486085414cc898263a21a63c7f860d54c9d02b45c8d"},
{file = "pywin32-309-cp313-cp313-win_amd64.whl", hash = "sha256:bd0724f58492db4cbfbeb1fcd606495205aa119370c0ddc4f70e5771a3ab768d"},
{file = "pywin32-309-cp313-cp313-win_arm64.whl", hash = "sha256:8fd9669cfd41863b688a1bc9b1d4d2d76fd4ba2128be50a70b0ea66b8d37953b"},
{file = "pywin32-309-cp38-cp38-win32.whl", hash = "sha256:617b837dc5d9dfa7e156dbfa7d3906c009a2881849a80a9ae7519f3dd8c6cb86"},
{file = "pywin32-309-cp38-cp38-win_amd64.whl", hash = "sha256:0be3071f555480fbfd86a816a1a773880ee655bf186aa2931860dbb44e8424f8"},
{file = "pywin32-309-cp39-cp39-win32.whl", hash = "sha256:72ae9ae3a7a6473223589a1621f9001fe802d59ed227fd6a8503c9af67c1d5f4"},
{file = "pywin32-309-cp39-cp39-win_amd64.whl", hash = "sha256:88bc06d6a9feac70783de64089324568ecbc65866e2ab318eab35da3811fd7ef"},
]
[[package]]
@ -5446,120 +5439,104 @@ pyyaml = "*"
[[package]]
name = "pyzmq"
version = "26.2.1"
version = "26.3.0"
description = "Python bindings for 0MQ"
optional = false
python-versions = ">=3.7"
python-versions = ">=3.8"
files = [
{file = "pyzmq-26.2.1-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:f39d1227e8256d19899d953e6e19ed2ccb689102e6d85e024da5acf410f301eb"},
{file = "pyzmq-26.2.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:a23948554c692df95daed595fdd3b76b420a4939d7a8a28d6d7dea9711878641"},
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:95f5728b367a042df146cec4340d75359ec6237beebf4a8f5cf74657c65b9257"},
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:95f7b01b3f275504011cf4cf21c6b885c8d627ce0867a7e83af1382ebab7b3ff"},
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80a00370a2ef2159c310e662c7c0f2d030f437f35f478bb8b2f70abd07e26b24"},
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:8531ed35dfd1dd2af95f5d02afd6545e8650eedbf8c3d244a554cf47d8924459"},
{file = "pyzmq-26.2.1-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:cdb69710e462a38e6039cf17259d328f86383a06c20482cc154327968712273c"},
{file = "pyzmq-26.2.1-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:e7eeaef81530d0b74ad0d29eec9997f1c9230c2f27242b8d17e0ee67662c8f6e"},
{file = "pyzmq-26.2.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:361edfa350e3be1f987e592e834594422338d7174364763b7d3de5b0995b16f3"},
{file = "pyzmq-26.2.1-cp310-cp310-win32.whl", hash = "sha256:637536c07d2fb6a354988b2dd1d00d02eb5dd443f4bbee021ba30881af1c28aa"},
{file = "pyzmq-26.2.1-cp310-cp310-win_amd64.whl", hash = "sha256:45fad32448fd214fbe60030aa92f97e64a7140b624290834cc9b27b3a11f9473"},
{file = "pyzmq-26.2.1-cp310-cp310-win_arm64.whl", hash = "sha256:d9da0289d8201c8a29fd158aaa0dfe2f2e14a181fd45e2dc1fbf969a62c1d594"},
{file = "pyzmq-26.2.1-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:c059883840e634a21c5b31d9b9a0e2b48f991b94d60a811092bc37992715146a"},
{file = "pyzmq-26.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:ed038a921df836d2f538e509a59cb638df3e70ca0fcd70d0bf389dfcdf784d2a"},
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9027a7fcf690f1a3635dc9e55e38a0d6602dbbc0548935d08d46d2e7ec91f454"},
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6d75fcb00a1537f8b0c0bb05322bc7e35966148ffc3e0362f0369e44a4a1de99"},
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f0019cc804ac667fb8c8eaecdb66e6d4a68acf2e155d5c7d6381a5645bd93ae4"},
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:f19dae58b616ac56b96f2e2290f2d18730a898a171f447f491cc059b073ca1fa"},
{file = "pyzmq-26.2.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:f5eeeb82feec1fc5cbafa5ee9022e87ffdb3a8c48afa035b356fcd20fc7f533f"},
{file = "pyzmq-26.2.1-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:000760e374d6f9d1a3478a42ed0c98604de68c9e94507e5452951e598ebecfba"},
{file = "pyzmq-26.2.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:817fcd3344d2a0b28622722b98500ae9c8bfee0f825b8450932ff19c0b15bebd"},
{file = "pyzmq-26.2.1-cp311-cp311-win32.whl", hash = "sha256:88812b3b257f80444a986b3596e5ea5c4d4ed4276d2b85c153a6fbc5ca457ae7"},
{file = "pyzmq-26.2.1-cp311-cp311-win_amd64.whl", hash = "sha256:ef29630fde6022471d287c15c0a2484aba188adbfb978702624ba7a54ddfa6c1"},
{file = "pyzmq-26.2.1-cp311-cp311-win_arm64.whl", hash = "sha256:f32718ee37c07932cc336096dc7403525301fd626349b6eff8470fe0f996d8d7"},
{file = "pyzmq-26.2.1-cp312-cp312-macosx_10_15_universal2.whl", hash = "sha256:a6549ecb0041dafa55b5932dcbb6c68293e0bd5980b5b99f5ebb05f9a3b8a8f3"},
{file = "pyzmq-26.2.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:0250c94561f388db51fd0213cdccbd0b9ef50fd3c57ce1ac937bf3034d92d72e"},
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:36ee4297d9e4b34b5dc1dd7ab5d5ea2cbba8511517ef44104d2915a917a56dc8"},
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c2a9cb17fd83b7a3a3009901aca828feaf20aa2451a8a487b035455a86549c09"},
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:786dd8a81b969c2081b31b17b326d3a499ddd1856e06d6d79ad41011a25148da"},
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:2d88ba221a07fc2c5581565f1d0fe8038c15711ae79b80d9462e080a1ac30435"},
{file = "pyzmq-26.2.1-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:1c84c1297ff9f1cd2440da4d57237cb74be21fdfe7d01a10810acba04e79371a"},
{file = "pyzmq-26.2.1-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:46d4ebafc27081a7f73a0f151d0c38d4291656aa134344ec1f3d0199ebfbb6d4"},
{file = "pyzmq-26.2.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:91e2bfb8e9a29f709d51b208dd5f441dc98eb412c8fe75c24ea464734ccdb48e"},
{file = "pyzmq-26.2.1-cp312-cp312-win32.whl", hash = "sha256:4a98898fdce380c51cc3e38ebc9aa33ae1e078193f4dc641c047f88b8c690c9a"},
{file = "pyzmq-26.2.1-cp312-cp312-win_amd64.whl", hash = "sha256:a0741edbd0adfe5f30bba6c5223b78c131b5aa4a00a223d631e5ef36e26e6d13"},
{file = "pyzmq-26.2.1-cp312-cp312-win_arm64.whl", hash = "sha256:e5e33b1491555843ba98d5209439500556ef55b6ab635f3a01148545498355e5"},
{file = "pyzmq-26.2.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:099b56ef464bc355b14381f13355542e452619abb4c1e57a534b15a106bf8e23"},
{file = "pyzmq-26.2.1-cp313-cp313-macosx_10_15_universal2.whl", hash = "sha256:651726f37fcbce9f8dd2a6dab0f024807929780621890a4dc0c75432636871be"},
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:57dd4d91b38fa4348e237a9388b4423b24ce9c1695bbd4ba5a3eada491e09399"},
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d51a7bfe01a48e1064131f3416a5439872c533d756396be2b39e3977b41430f9"},
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c7154d228502e18f30f150b7ce94f0789d6b689f75261b623f0fdc1eec642aab"},
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:f1f31661a80cc46aba381bed475a9135b213ba23ca7ff6797251af31510920ce"},
{file = "pyzmq-26.2.1-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:290c96f479504439b6129a94cefd67a174b68ace8a8e3f551b2239a64cfa131a"},
{file = "pyzmq-26.2.1-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:f2c307fbe86e18ab3c885b7e01de942145f539165c3360e2af0f094dd440acd9"},
{file = "pyzmq-26.2.1-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:b314268e716487bfb86fcd6f84ebbe3e5bec5fac75fdf42bc7d90fdb33f618ad"},
{file = "pyzmq-26.2.1-cp313-cp313-win32.whl", hash = "sha256:edb550616f567cd5603b53bb52a5f842c0171b78852e6fc7e392b02c2a1504bb"},
{file = "pyzmq-26.2.1-cp313-cp313-win_amd64.whl", hash = "sha256:100a826a029c8ef3d77a1d4c97cbd6e867057b5806a7276f2bac1179f893d3bf"},
{file = "pyzmq-26.2.1-cp313-cp313-win_arm64.whl", hash = "sha256:6991ee6c43e0480deb1b45d0c7c2bac124a6540cba7db4c36345e8e092da47ce"},
{file = "pyzmq-26.2.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:25e720dba5b3a3bb2ad0ad5d33440babd1b03438a7a5220511d0c8fa677e102e"},
{file = "pyzmq-26.2.1-cp313-cp313t-macosx_10_15_universal2.whl", hash = "sha256:9ec6abfb701437142ce9544bd6a236addaf803a32628d2260eb3dbd9a60e2891"},
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2e1eb9d2bfdf5b4e21165b553a81b2c3bd5be06eeddcc4e08e9692156d21f1f6"},
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:90dc731d8e3e91bcd456aa7407d2eba7ac6f7860e89f3766baabb521f2c1de4a"},
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0b6a93d684278ad865fc0b9e89fe33f6ea72d36da0e842143891278ff7fd89c3"},
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:c1bb37849e2294d519117dd99b613c5177934e5c04a5bb05dd573fa42026567e"},
{file = "pyzmq-26.2.1-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:632a09c6d8af17b678d84df442e9c3ad8e4949c109e48a72f805b22506c4afa7"},
{file = "pyzmq-26.2.1-cp313-cp313t-musllinux_1_1_i686.whl", hash = "sha256:fc409c18884eaf9ddde516d53af4f2db64a8bc7d81b1a0c274b8aa4e929958e8"},
{file = "pyzmq-26.2.1-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:17f88622b848805d3f6427ce1ad5a2aa3cf61f12a97e684dab2979802024d460"},
{file = "pyzmq-26.2.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:3ef584f13820d2629326fe20cc04069c21c5557d84c26e277cfa6235e523b10f"},
{file = "pyzmq-26.2.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:160194d1034902937359c26ccfa4e276abffc94937e73add99d9471e9f555dd6"},
{file = "pyzmq-26.2.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:574b285150afdbf0a0424dddf7ef9a0d183988eb8d22feacb7160f7515e032cb"},
{file = "pyzmq-26.2.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:44dba28c34ce527cf687156c81f82bf1e51f047838d5964f6840fd87dfecf9fe"},
{file = "pyzmq-26.2.1-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:9fbdb90b85c7624c304f72ec7854659a3bd901e1c0ffb2363163779181edeb68"},
{file = "pyzmq-26.2.1-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:a7ad34a2921e8f76716dc7205c9bf46a53817e22b9eec2e8a3e08ee4f4a72468"},
{file = "pyzmq-26.2.1-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:866c12b7c90dd3a86983df7855c6f12f9407c8684db6aa3890fc8027462bda82"},
{file = "pyzmq-26.2.1-cp37-cp37m-win32.whl", hash = "sha256:eeb37f65350d5c5870517f02f8bbb2ac0fbec7b416c0f4875219fef305a89a45"},
{file = "pyzmq-26.2.1-cp37-cp37m-win_amd64.whl", hash = "sha256:4eb3197f694dfb0ee6af29ef14a35f30ae94ff67c02076eef8125e2d98963cd0"},
{file = "pyzmq-26.2.1-cp38-cp38-macosx_10_15_universal2.whl", hash = "sha256:36d4e7307db7c847fe37413f333027d31c11d5e6b3bacbb5022661ac635942ba"},
{file = "pyzmq-26.2.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:1c6ae0e95d0a4b0cfe30f648a18e764352d5415279bdf34424decb33e79935b8"},
{file = "pyzmq-26.2.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:5b4fc44f5360784cc02392f14235049665caaf7c0fe0b04d313e763d3338e463"},
{file = "pyzmq-26.2.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:51431f6b2750eb9b9d2b2952d3cc9b15d0215e1b8f37b7a3239744d9b487325d"},
{file = "pyzmq-26.2.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:bdbc78ae2065042de48a65f1421b8af6b76a0386bb487b41955818c3c1ce7bed"},
{file = "pyzmq-26.2.1-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:d14f50d61a89b0925e4d97a0beba6053eb98c426c5815d949a43544f05a0c7ec"},
{file = "pyzmq-26.2.1-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:004837cb958988c75d8042f5dac19a881f3d9b3b75b2f574055e22573745f841"},
{file = "pyzmq-26.2.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:0b2007f28ce1b8acebdf4812c1aab997a22e57d6a73b5f318b708ef9bcabbe95"},
{file = "pyzmq-26.2.1-cp38-cp38-win32.whl", hash = "sha256:269c14904da971cb5f013100d1aaedb27c0a246728c341d5d61ddd03f463f2f3"},
{file = "pyzmq-26.2.1-cp38-cp38-win_amd64.whl", hash = "sha256:31fff709fef3b991cfe7189d2cfe0c413a1d0e82800a182cfa0c2e3668cd450f"},
{file = "pyzmq-26.2.1-cp39-cp39-macosx_10_15_universal2.whl", hash = "sha256:a4bffcadfd40660f26d1b3315a6029fd4f8f5bf31a74160b151f5c577b2dc81b"},
{file = "pyzmq-26.2.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:e76ad4729c2f1cf74b6eb1bdd05f6aba6175999340bd51e6caee49a435a13bf5"},
{file = "pyzmq-26.2.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:8b0f5bab40a16e708e78a0c6ee2425d27e1a5d8135c7a203b4e977cee37eb4aa"},
{file = "pyzmq-26.2.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:e8e47050412f0ad3a9b2287779758073cbf10e460d9f345002d4779e43bb0136"},
{file = "pyzmq-26.2.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7f18ce33f422d119b13c1363ed4cce245b342b2c5cbbb76753eabf6aa6f69c7d"},
{file = "pyzmq-26.2.1-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:ceb0d78b7ef106708a7e2c2914afe68efffc0051dc6a731b0dbacd8b4aee6d68"},
{file = "pyzmq-26.2.1-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:7ebdd96bd637fd426d60e86a29ec14b8c1ab64b8d972f6a020baf08a30d1cf46"},
{file = "pyzmq-26.2.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:03719e424150c6395b9513f53a5faadcc1ce4b92abdf68987f55900462ac7eec"},
{file = "pyzmq-26.2.1-cp39-cp39-win32.whl", hash = "sha256:ef5479fac31df4b304e96400fc67ff08231873ee3537544aa08c30f9d22fce38"},
{file = "pyzmq-26.2.1-cp39-cp39-win_amd64.whl", hash = "sha256:f92a002462154c176dac63a8f1f6582ab56eb394ef4914d65a9417f5d9fde218"},
{file = "pyzmq-26.2.1-cp39-cp39-win_arm64.whl", hash = "sha256:1fd4b3efc6f62199886440d5e27dd3ccbcb98dfddf330e7396f1ff421bfbb3c2"},
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:380816d298aed32b1a97b4973a4865ef3be402a2e760204509b52b6de79d755d"},
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:97cbb368fd0debdbeb6ba5966aa28e9a1ae3396c7386d15569a6ca4be4572b99"},
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:abf7b5942c6b0dafcc2823ddd9154f419147e24f8df5b41ca8ea40a6db90615c"},
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3fe6e28a8856aea808715f7a4fc11f682b9d29cac5d6262dd8fe4f98edc12d53"},
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:bd8fdee945b877aa3bffc6a5a8816deb048dab0544f9df3731ecd0e54d8c84c9"},
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl", hash = "sha256:ee7152f32c88e0e1b5b17beb9f0e2b14454235795ef68c0c120b6d3d23d12833"},
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:baa1da72aecf6a490b51fba7a51f1ce298a1e0e86d0daef8265c8f8f9848eb77"},
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:49135bb327fca159262d8fd14aa1f4a919fe071b04ed08db4c7c37d2f0647162"},
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8bacc1a10c150d58e8a9ee2b2037a70f8d903107e0f0b6e079bf494f2d09c091"},
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-win_amd64.whl", hash = "sha256:09dac387ce62d69bec3f06d51610ca1d660e7849eb45f68e38e7f5cf1f49cbcb"},
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl", hash = "sha256:70b3a46ecd9296e725ccafc17d732bfc3cdab850b54bd913f843a0a54dfb2c04"},
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:59660e15c797a3b7a571c39f8e0b62a1f385f98ae277dfe95ca7eaf05b5a0f12"},
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:0f50db737d688e96ad2a083ad2b453e22865e7e19c7f17d17df416e91ddf67eb"},
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a003200b6cd64e89b5725ff7e284a93ab24fd54bbac8b4fa46b1ed57be693c27"},
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:f9ba5def063243793dec6603ad1392f735255cbc7202a3a484c14f99ec290705"},
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:1238c2448c58b9c8d6565579393148414a42488a5f916b3f322742e561f6ae0d"},
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8eddb3784aed95d07065bcf94d07e8c04024fdb6b2386f08c197dfe6b3528fda"},
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f0f19c2097fffb1d5b07893d75c9ee693e9cbc809235cf3f2267f0ef6b015f24"},
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0995fd3530f2e89d6b69a2202e340bbada3191014352af978fa795cb7a446331"},
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:7c6160fe513654e65665332740f63de29ce0d165e053c0c14a161fa60dd0da01"},
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:8ec8e3aea6146b761d6c57fcf8f81fcb19f187afecc19bf1701a48db9617a217"},
{file = "pyzmq-26.2.1.tar.gz", hash = "sha256:17d72a74e5e9ff3829deb72897a175333d3ef5b5413948cae3cf7ebf0b02ecca"},
{file = "pyzmq-26.3.0-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:1586944f4736515af5c6d3a5b150c7e8ca2a2d6e46b23057320584d6f2438f4a"},
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa7efc695d1fc9f72d91bf9b6c6fe2d7e1b4193836ec530a98faf7d7a7577a58"},
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bd84441e4021cec6e4dd040550386cd9c9ea1d9418ea1a8002dbb7b576026b2b"},
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9176856f36c34a8aa5c0b35ddf52a5d5cd8abeece57c2cd904cfddae3fd9acd3"},
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:49334faa749d55b77f084389a80654bf2e68ab5191c0235066f0140c1b670d64"},
{file = "pyzmq-26.3.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:fd30fc80fe96efb06bea21667c5793bbd65c0dc793187feb39b8f96990680b00"},
{file = "pyzmq-26.3.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:b2eddfbbfb473a62c3a251bb737a6d58d91907f6e1d95791431ebe556f47d916"},
{file = "pyzmq-26.3.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:70b3acb9ad729a53d4e751dace35404a024f188aad406013454216aba5485b4e"},
{file = "pyzmq-26.3.0-cp310-cp310-win32.whl", hash = "sha256:c1bd75d692cd7c6d862a98013bfdf06702783b75cffbf5dae06d718fecefe8f2"},
{file = "pyzmq-26.3.0-cp310-cp310-win_amd64.whl", hash = "sha256:d7165bcda0dbf203e5ad04d79955d223d84b2263df4db92f525ba370b03a12ab"},
{file = "pyzmq-26.3.0-cp310-cp310-win_arm64.whl", hash = "sha256:e34a63f71d2ecffb3c643909ad2d488251afeb5ef3635602b3448e609611a7ed"},
{file = "pyzmq-26.3.0-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:2833602d9d42c94b9d0d2a44d2b382d3d3a4485be018ba19dddc401a464c617a"},
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d8270d104ec7caa0bdac246d31d48d94472033ceab5ba142881704350b28159c"},
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c208a977843d18d3bd185f323e4eaa912eb4869cb230947dc6edd8a27a4e558a"},
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eddc2be28a379c218e0d92e4a432805dcb0ca5870156a90b54c03cd9799f9f8a"},
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:c0b519fa2159c42272f8a244354a0e110d65175647e5185b04008ec00df9f079"},
{file = "pyzmq-26.3.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:1595533de3a80bf8363372c20bafa963ec4bf9f2b8f539b1d9a5017f430b84c9"},
{file = "pyzmq-26.3.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:bbef99eb8d18ba9a40f00e8836b8040cdcf0f2fa649684cf7a66339599919d21"},
{file = "pyzmq-26.3.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:979486d444ca3c469cd1c7f6a619ce48ff08b3b595d451937db543754bfacb65"},
{file = "pyzmq-26.3.0-cp311-cp311-win32.whl", hash = "sha256:4b127cfe10b4c56e4285b69fd4b38ea1d368099ea4273d8fb349163fce3cd598"},
{file = "pyzmq-26.3.0-cp311-cp311-win_amd64.whl", hash = "sha256:cf736cc1298ef15280d9fcf7a25c09b05af016656856dc6fe5626fd8912658dd"},
{file = "pyzmq-26.3.0-cp311-cp311-win_arm64.whl", hash = "sha256:2dc46ec09f5d36f606ac8393303149e69d17121beee13c8dac25e2a2078e31c4"},
{file = "pyzmq-26.3.0-cp312-cp312-macosx_10_15_universal2.whl", hash = "sha256:c80653332c6136da7f4d4e143975e74ac0fa14f851f716d90583bc19e8945cea"},
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6e317ee1d4528a03506cb1c282cd9db73660a35b3564096de37de7350e7d87a7"},
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:943a22ebb3daacb45f76a9bcca9a7b74e7d94608c0c0505da30af900b998ca8d"},
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3fc9e71490d989144981ea21ef4fdfaa7b6aa84aff9632d91c736441ce2f6b00"},
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:e281a8071a06888575a4eb523c4deeefdcd2f5fe4a2d47e02ac8bf3a5b49f695"},
{file = "pyzmq-26.3.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:be77efd735bb1064605be8dec6e721141c1421ef0b115ef54e493a64e50e9a52"},
{file = "pyzmq-26.3.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:7a4ac2ffa34f1212dd586af90f4ba894e424f0cabb3a49cdcff944925640f6ac"},
{file = "pyzmq-26.3.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:ba698c7c252af83b6bba9775035263f0df5f807f0404019916d4b71af8161f66"},
{file = "pyzmq-26.3.0-cp312-cp312-win32.whl", hash = "sha256:214038aaa88e801e54c2ef0cfdb2e6df27eb05f67b477380a452b595c5ecfa37"},
{file = "pyzmq-26.3.0-cp312-cp312-win_amd64.whl", hash = "sha256:bad7fe0372e505442482ca3ccbc0d6f38dae81b1650f57a0aa6bbee18e7df495"},
{file = "pyzmq-26.3.0-cp312-cp312-win_arm64.whl", hash = "sha256:b7b578d604e79e99aa39495becea013fd043fa9f36e4b490efa951f3d847a24d"},
{file = "pyzmq-26.3.0-cp313-cp313-macosx_10_15_universal2.whl", hash = "sha256:fa85953df84beb7b8b73cb3ec3f5d92b62687a09a8e71525c6734e020edf56fd"},
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:209d09f0ab6ddbcebe64630d1e6ca940687e736f443c265ae15bc4bfad833597"},
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d35cc1086f1d4f907df85c6cceb2245cb39a04f69c3f375993363216134d76d4"},
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b380e9087078ba91e45fb18cdd0c25275ffaa045cf63c947be0ddae6186bc9d9"},
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:6d64e74143587efe7c9522bb74d1448128fdf9897cc9b6d8b9927490922fd558"},
{file = "pyzmq-26.3.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:efba4f53ac7752eea6d8ca38a4ddac579e6e742fba78d1e99c12c95cd2acfc64"},
{file = "pyzmq-26.3.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:9b0137a1c40da3b7989839f9b78a44de642cdd1ce20dcef341de174c8d04aa53"},
{file = "pyzmq-26.3.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:a995404bd3982c089e57b428c74edd5bfc3b0616b3dbcd6a8e270f1ee2110f36"},
{file = "pyzmq-26.3.0-cp313-cp313-win32.whl", hash = "sha256:240b1634b9e530ef6a277d95cbca1a6922f44dfddc5f0a3cd6c722a8de867f14"},
{file = "pyzmq-26.3.0-cp313-cp313-win_amd64.whl", hash = "sha256:fe67291775ea4c2883764ba467eb389c29c308c56b86c1e19e49c9e1ed0cbeca"},
{file = "pyzmq-26.3.0-cp313-cp313-win_arm64.whl", hash = "sha256:73ca9ae9a9011b714cf7650450cd9c8b61a135180b708904f1f0a05004543dce"},
{file = "pyzmq-26.3.0-cp313-cp313t-macosx_10_15_universal2.whl", hash = "sha256:fea7efbd7e49af9d7e5ed6c506dfc7de3d1a628790bd3a35fd0e3c904dc7d464"},
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c4430c7cba23bb0e2ee203eee7851c1654167d956fc6d4b3a87909ccaf3c5825"},
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:016d89bee8c7d566fad75516b4e53ec7c81018c062d4c51cd061badf9539be52"},
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:04bfe59852d76d56736bfd10ac1d49d421ab8ed11030b4a0332900691507f557"},
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:1fe05bd0d633a0f672bb28cb8b4743358d196792e1caf04973b7898a0d70b046"},
{file = "pyzmq-26.3.0-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:2aa1a9f236d5b835fb8642f27de95f9edcfd276c4bc1b6ffc84f27c6fb2e2981"},
{file = "pyzmq-26.3.0-cp313-cp313t-musllinux_1_1_i686.whl", hash = "sha256:21399b31753bf321043ea60c360ed5052cc7be20739785b1dff1820f819e35b3"},
{file = "pyzmq-26.3.0-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:d015efcd96aca8882057e7e6f06224f79eecd22cad193d3e6a0a91ec67590d1f"},
{file = "pyzmq-26.3.0-cp38-cp38-macosx_10_15_universal2.whl", hash = "sha256:18183cc3851b995fdc7e5f03d03b8a4e1b12b0f79dff1ec1da75069af6357a05"},
{file = "pyzmq-26.3.0-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:da87e977f92d930a3683e10ba2b38bcc59adfc25896827e0b9d78b208b7757a6"},
{file = "pyzmq-26.3.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:cf6db401f4957afbf372a4730c6d5b2a234393af723983cbf4bcd13d54c71e1a"},
{file = "pyzmq-26.3.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:03caa2ffd64252122139d50ec92987f89616b9b92c9ba72920b40e92709d5e26"},
{file = "pyzmq-26.3.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:fbf206e5329e20937fa19bd41cf3af06d5967f8f7e86b59d783b26b40ced755c"},
{file = "pyzmq-26.3.0-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:6fb539a6382a048308b409d8c66d79bf636eda1b24f70c78f2a1fd16e92b037b"},
{file = "pyzmq-26.3.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:7897b8c8bbbb2bd8cad887bffcb07aede71ef1e45383bd4d6ac049bf0af312a4"},
{file = "pyzmq-26.3.0-cp38-cp38-win32.whl", hash = "sha256:91dead2daca698ae52ce70ee2adbb94ddd9b5f96877565fd40aa4efd18ecc6a3"},
{file = "pyzmq-26.3.0-cp38-cp38-win_amd64.whl", hash = "sha256:8c088e009a6d6b9f563336adb906e3a8d3fd64db129acc8d8fd0e9fe22b2dac8"},
{file = "pyzmq-26.3.0-cp39-cp39-macosx_10_15_universal2.whl", hash = "sha256:2eaed0d911fb3280981d5495978152fab6afd9fe217fd16f411523665089cef1"},
{file = "pyzmq-26.3.0-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:7998b60ef1c105846fb3bfca494769fde3bba6160902e7cd27a8df8257890ee9"},
{file = "pyzmq-26.3.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:96c0006a8d1d00e46cb44c8e8d7316d4a232f3d8f2ed43179d4578dbcb0829b6"},
{file = "pyzmq-26.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5e17cc198dc50a25a0f245e6b1e56f692df2acec3ccae82d1f60c34bfb72bbec"},
{file = "pyzmq-26.3.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:92a30840f4f2a31f7049d0a7de5fc69dd03b19bd5d8e7fed8d0bde49ce49b589"},
{file = "pyzmq-26.3.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:f52eba83272a26b444f4b8fc79f2e2c83f91d706d693836c9f7ccb16e6713c31"},
{file = "pyzmq-26.3.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:952085a09ff32115794629ba47f8940896d7842afdef1283332109d38222479d"},
{file = "pyzmq-26.3.0-cp39-cp39-win32.whl", hash = "sha256:0240289e33e3fbae44a5db73e54e955399179332a6b1d47c764a4983ec1524c3"},
{file = "pyzmq-26.3.0-cp39-cp39-win_amd64.whl", hash = "sha256:b2db7c82f08b8ce44c0b9d1153ce63907491972a7581e8b6adea71817f119df8"},
{file = "pyzmq-26.3.0-cp39-cp39-win_arm64.whl", hash = "sha256:2d3459b6311463c96abcb97808ee0a1abb0d932833edb6aa81c30d622fd4a12d"},
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:ad03f4252d9041b0635c37528dfa3f44b39f46024ae28c8567f7423676ee409b"},
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0f3dfb68cf7bf4cfdf34283a75848e077c5defa4907506327282afe92780084d"},
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:356ec0e39c5a9cda872b65aca1fd8a5d296ffdadf8e2442b70ff32e73ef597b1"},
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:749d671b0eec8e738bbf0b361168369d8c682b94fcd458c20741dc4d69ef5278"},
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:f950f17ae608e0786298340163cac25a4c5543ef25362dd5ddb6dcb10b547be9"},
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:b4fc9903a73c25be9d5fe45c87faababcf3879445efa16140146b08fccfac017"},
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c15b69af22030960ac63567e98ad8221cddf5d720d9cf03d85021dfd452324ef"},
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2cf9ab0dff4dbaa2e893eb608373c97eb908e53b7d9793ad00ccbd082c0ee12f"},
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3ec332675f6a138db57aad93ae6387953763f85419bdbd18e914cb279ee1c451"},
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:eb96568a22fe070590942cd4780950e2172e00fb033a8b76e47692583b1bd97c"},
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-macosx_10_15_x86_64.whl", hash = "sha256:009a38241c76184cb004c869e82a99f0aee32eda412c1eb44df5820324a01d25"},
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:4c22a12713707467abedc6d75529dd365180c4c2a1511268972c6e1d472bd63e"},
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:1614fcd116275d24f2346ffca4047a741c546ad9d561cbf7813f11226ca4ed2c"},
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e2cafe7e9c7fed690e8ecf65af119f9c482923b5075a78f6f7629c63e1b4b1d"},
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:14e0b81753424bd374075df6cc30b87f2c99e5f022501d97eff66544ca578941"},
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:21c6ddb98557a77cfe3366af0c5600fb222a1b2de5f90d9cd052b324e0c295e8"},
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1fc81d5d60c9d40e692de14b8d884d43cf67562402b931681f0ccb3ce6b19875"},
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:52b064fafef772d0f5dbf52d4c39f092be7bc62d9a602fe6e82082e001326de3"},
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b72206eb041f780451c61e1e89dbc3705f3d66aaaa14ee320d4f55864b13358a"},
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:8ab78dc21c7b1e13053086bcf0b4246440b43b5409904b73bfd1156654ece8a1"},
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:0b42403ad7d1194dca9574cd3c56691c345f4601fa2d0a33434f35142baec7ac"},
{file = "pyzmq-26.3.0.tar.gz", hash = "sha256:f1cd68b8236faab78138a8fc703f7ca0ad431b17a3fcac696358600d4e6243b3"},
]
[package.dependencies]
@ -5906,21 +5883,21 @@ files = [
[[package]]
name = "rtree"
version = "1.3.0"
version = "1.4.0"
description = "R-Tree spatial index for Python GIS"
optional = false
python-versions = ">=3.8"
python-versions = ">=3.9"
files = [
{file = "Rtree-1.3.0-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:80879d9db282a2273ca3a0d896c84583940e9777477727a277624ebfd424c517"},
{file = "Rtree-1.3.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:4328e9e421797c347e6eb08efbbade962fe3664ebd60c1dffe82c40911b1e125"},
{file = "Rtree-1.3.0-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:037130d3ce1fc029de81941ec416ba5546f66228380ba19bb41f2ea1294e8423"},
{file = "Rtree-1.3.0-py3-none-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:864a05d0c3b7ce6c5e34378b7ab630057603b79179368bc50624258bdf2ff631"},
{file = "Rtree-1.3.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ec2ed6d1635753dab966e68f592a9c4896f3f4ec6ad2b09b776d592eacd883a9"},
{file = "Rtree-1.3.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:b4485fb3e5c5e85b94a95f0a930a3848e040d2699cfb012940ba5b0130f1e09a"},
{file = "Rtree-1.3.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:7e2e9211f4fb404c06a08fd2cbebb03234214f73c51913bb371c3d9954e99cc9"},
{file = "Rtree-1.3.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:c021f4772b25cc24915da8073e553ded6fa8d0b317caa4202255ed26b2344c1c"},
{file = "Rtree-1.3.0-py3-none-win_amd64.whl", hash = "sha256:97f835801d24c10bbf02381abe5e327345c8296ec711dde7658792376abafc66"},
{file = "rtree-1.3.0.tar.gz", hash = "sha256:b36e9dd2dc60ffe3d02e367242d2c26f7281b00e1aaf0c39590442edaaadd916"},
{file = "rtree-1.4.0-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:4d1bebc418101480aabf41767e772dd2155d3b27b1376cccbd93e4509485e091"},
{file = "rtree-1.4.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:997f8c38d5dffa3949ea8adb4c8b291ea5cd4ef5ee69455d642dd171baf9991d"},
{file = "rtree-1.4.0-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0133d9c54ab3ffe874ba6d411dbe0254765c5e68d92da5b91362c370f16fd997"},
{file = "rtree-1.4.0-py3-none-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:d3b7bf1fe6463139377995ebe22a01a7005d134707f43672a3c09305e12f5f43"},
{file = "rtree-1.4.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:27e4a6d617d63dcb82fcd4c2856134b8a3741bd1af3b1a0d98e886054f394da5"},
{file = "rtree-1.4.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5258e826064eab82439760201e9421ce6d4340789d6d080c1b49367ddd03f61f"},
{file = "rtree-1.4.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:20d5b3f9cf8bbbcc9fec42ab837c603c5dd86103ef29134300c8da2495c1248b"},
{file = "rtree-1.4.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:a67bee1233370a4c72c0969a96d2a1df1ba404ddd9f146849c53ab420eab361b"},
{file = "rtree-1.4.0-py3-none-win_amd64.whl", hash = "sha256:ba83efc7b7563905b1bfdfc14490c4bfb59e92e5e6156bdeb6ec5df5117252f4"},
{file = "rtree-1.4.0.tar.gz", hash = "sha256:9d97c7c5dcf25f6c0599c76d9933368c6a8d7238f2c1d00e76f1a69369ca82a0"},
]
[[package]]
@ -6241,13 +6218,13 @@ train = ["accelerate (>=0.20.3)", "datasets"]
[[package]]
name = "setuptools"
version = "75.8.2"
version = "76.0.0"
description = "Easily download, build, install, upgrade, and uninstall Python packages"
optional = false
python-versions = ">=3.9"
files = [
{file = "setuptools-75.8.2-py3-none-any.whl", hash = "sha256:558e47c15f1811c1fa7adbd0096669bf76c1d3f433f58324df69f3f5ecac4e8f"},
{file = "setuptools-75.8.2.tar.gz", hash = "sha256:4880473a969e5f23f2a2be3646b2dfd84af9028716d398e46192f84bc36900d2"},
{file = "setuptools-76.0.0-py3-none-any.whl", hash = "sha256:199466a166ff664970d0ee145839f5582cb9bca7a0a3a2e795b6a9cb2308e9c6"},
{file = "setuptools-76.0.0.tar.gz", hash = "sha256:43b4ee60e10b0d0ee98ad11918e114c70701bc6051662a9a675a0496c1a158f4"},
]
[package.extras]
@ -6491,13 +6468,13 @@ files = [
[[package]]
name = "threadpoolctl"
version = "3.5.0"
version = "3.6.0"
description = "threadpoolctl"
optional = false
python-versions = ">=3.8"
python-versions = ">=3.9"
files = [
{file = "threadpoolctl-3.5.0-py3-none-any.whl", hash = "sha256:56c1e26c150397e58c4926da8eeee87533b1e32bef131bd4bf6a2f45f3185467"},
{file = "threadpoolctl-3.5.0.tar.gz", hash = "sha256:082433502dd922bf738de0d8bcc4fdcbf0979ff44c42bd40f5af8a282f6fa107"},
{file = "threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb"},
{file = "threadpoolctl-3.6.0.tar.gz", hash = "sha256:8ab8b4aa3491d812b623328249fab5302a68d2d71745c8a4c719a2fcaba9f44e"},
]
[[package]]
@ -6670,26 +6647,26 @@ testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests", "ruff"]
[[package]]
name = "tokenizers"
version = "0.21.0"
version = "0.21.1"
description = ""
optional = false
python-versions = ">=3.7"
python-versions = ">=3.9"
files = [
{file = "tokenizers-0.21.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:3c4c93eae637e7d2aaae3d376f06085164e1660f89304c0ab2b1d08a406636b2"},
{file = "tokenizers-0.21.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:f53ea537c925422a2e0e92a24cce96f6bc5046bbef24a1652a5edc8ba975f62e"},
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b177fb54c4702ef611de0c069d9169f0004233890e0c4c5bd5508ae05abf193"},
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:6b43779a269f4629bebb114e19c3fca0223296ae9fea8bb9a7a6c6fb0657ff8e"},
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:9aeb255802be90acfd363626753fda0064a8df06031012fe7d52fd9a905eb00e"},
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d8b09dbeb7a8d73ee204a70f94fc06ea0f17dcf0844f16102b9f414f0b7463ba"},
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:400832c0904f77ce87c40f1a8a27493071282f785724ae62144324f171377273"},
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e84ca973b3a96894d1707e189c14a774b701596d579ffc7e69debfc036a61a04"},
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:eb7202d231b273c34ec67767378cd04c767e967fda12d4a9e36208a34e2f137e"},
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:089d56db6782a73a27fd8abf3ba21779f5b85d4a9f35e3b493c7bbcbbf0d539b"},
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:c87ca3dc48b9b1222d984b6b7490355a6fdb411a2d810f6f05977258400ddb74"},
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:4145505a973116f91bc3ac45988a92e618a6f83eb458f49ea0790df94ee243ff"},
{file = "tokenizers-0.21.0-cp39-abi3-win32.whl", hash = "sha256:eb1702c2f27d25d9dd5b389cc1f2f51813e99f8ca30d9e25348db6585a97e24a"},
{file = "tokenizers-0.21.0-cp39-abi3-win_amd64.whl", hash = "sha256:87841da5a25a3a5f70c102de371db120f41873b854ba65e52bccd57df5a3780c"},
{file = "tokenizers-0.21.0.tar.gz", hash = "sha256:ee0894bf311b75b0c03079f33859ae4b2334d675d4e93f5a4132e1eae2834fe4"},
{file = "tokenizers-0.21.1-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:e78e413e9e668ad790a29456e677d9d3aa50a9ad311a40905d6861ba7692cf41"},
{file = "tokenizers-0.21.1-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:cd51cd0a91ecc801633829fcd1fda9cf8682ed3477c6243b9a095539de4aecf3"},
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:28da6b72d4fb14ee200a1bd386ff74ade8992d7f725f2bde2c495a9a98cf4d9f"},
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:34d8cfde551c9916cb92014e040806122295a6800914bab5865deb85623931cf"},
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:aaa852d23e125b73d283c98f007e06d4595732104b65402f46e8ef24b588d9f8"},
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a21a15d5c8e603331b8a59548bbe113564136dc0f5ad8306dd5033459a226da0"},
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2fdbd4c067c60a0ac7eca14b6bd18a5bebace54eb757c706b47ea93204f7a37c"},
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2dd9a0061e403546f7377df940e866c3e678d7d4e9643d0461ea442b4f89e61a"},
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:db9484aeb2e200c43b915a1a0150ea885e35f357a5a8fabf7373af333dcc8dbf"},
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:ed248ab5279e601a30a4d67bdb897ecbe955a50f1e7bb62bd99f07dd11c2f5b6"},
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:9ac78b12e541d4ce67b4dfd970e44c060a2147b9b2a21f509566d556a509c67d"},
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:e5a69c1a4496b81a5ee5d2c1f3f7fbdf95e90a0196101b0ee89ed9956b8a168f"},
{file = "tokenizers-0.21.1-cp39-abi3-win32.whl", hash = "sha256:1039a3a5734944e09de1d48761ade94e00d0fa760c0e0551151d4dd851ba63e3"},
{file = "tokenizers-0.21.1-cp39-abi3-win_amd64.whl", hash = "sha256:0f0dcbcc9f6e13e675a66d7a5f2f225a736745ce484c1a4e07476a89ccdad382"},
{file = "tokenizers-0.21.1.tar.gz", hash = "sha256:a1bb04dc5b448985f86ecd4b05407f5a8d97cb2c0532199b2a302a604a0165ab"},
]
[package.dependencies]
@ -7223,13 +7200,13 @@ typing-extensions = ">=3.7.4.3"
[[package]]
name = "types-openpyxl"
version = "3.1.5.20241225"
version = "3.1.5.20250306"
description = "Typing stubs for openpyxl"
optional = false
python-versions = ">=3.8"
python-versions = ">=3.9"
files = [
{file = "types_openpyxl-3.1.5.20241225-py3-none-any.whl", hash = "sha256:903d92f58f42135b0614d609868c619aee12e1c7b65ccf8472dfd2706bcc6f47"},
{file = "types_openpyxl-3.1.5.20241225.tar.gz", hash = "sha256:3c076f4c6f114e1859b6857ffd486e96c938c0434451c60dc54c2bcb62750d78"},
{file = "types_openpyxl-3.1.5.20250306-py3-none-any.whl", hash = "sha256:f7733dac1dcb07c89ff5ffde8452ee8d272be638defed855f4c48b2990ce5aa7"},
{file = "types_openpyxl-3.1.5.20250306.tar.gz", hash = "sha256:aa7ad2425e8020ff46a31633becfe1f3c64114498d964c536199f654b464e6bc"},
]
[[package]]
@ -7245,13 +7222,13 @@ files = [
[[package]]
name = "types-requests"
version = "2.32.0.20250301"
version = "2.32.0.20250306"
description = "Typing stubs for requests"
optional = false
python-versions = ">=3.9"
files = [
{file = "types_requests-2.32.0.20250301-py3-none-any.whl", hash = "sha256:0003e0124e2cbefefb88222ff822b48616af40c74df83350f599a650c8de483b"},
{file = "types_requests-2.32.0.20250301.tar.gz", hash = "sha256:3d909dc4eaab159c0d964ebe8bfa326a7afb4578d8706408d417e17d61b0c500"},
{file = "types_requests-2.32.0.20250306-py3-none-any.whl", hash = "sha256:25f2cbb5c8710b2022f8bbee7b2b66f319ef14aeea2f35d80f18c9dbf3b60a0b"},
{file = "types_requests-2.32.0.20250306.tar.gz", hash = "sha256:0962352694ec5b2f95fda877ee60a159abdf84a0fc6fdace599f20acb41a03d1"},
]
[package.dependencies]
@ -7399,13 +7376,13 @@ zstd = ["zstandard (>=0.18.0)"]
[[package]]
name = "virtualenv"
version = "20.29.2"
version = "20.29.3"
description = "Virtual Python Environment builder"
optional = false
python-versions = ">=3.8"
files = [
{file = "virtualenv-20.29.2-py3-none-any.whl", hash = "sha256:febddfc3d1ea571bdb1dc0f98d7b45d24def7428214d4fb73cc486c9568cce6a"},
{file = "virtualenv-20.29.2.tar.gz", hash = "sha256:fdaabebf6d03b5ba83ae0a02cfe96f48a716f4fae556461d180825866f75b728"},
{file = "virtualenv-20.29.3-py3-none-any.whl", hash = "sha256:3e3d00f5807e83b234dfb6122bf37cfadf4be216c53a49ac059d02414f819170"},
{file = "virtualenv-20.29.3.tar.gz", hash = "sha256:95e39403fcf3940ac45bc717597dba16110b74506131845d9b687d5e73d947ac"},
]
[package.dependencies]
@ -7861,4 +7838,4 @@ vlm = ["accelerate", "transformers", "transformers"]
[metadata]
lock-version = "2.0"
python-versions = "^3.9"
content-hash = "c37ae7d39cb2af7031248c2f0308c91160facafd948e982899245e5d8369bbbb"
content-hash = "16324c95a8aae1a710c4151e509c59e9a97d8bb97d4c726861ab3215fbea0a9d"

View File

@ -1,6 +1,6 @@
[tool.poetry]
name = "docling"
version = "2.26.0" # DO NOT EDIT, updated automatically
version = "2.27.0" # DO NOT EDIT, updated automatically
description = "SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications."
authors = [
"Christoph Auer <cau@zurich.ibm.com>",
@ -46,9 +46,9 @@ packages = [{ include = "docling" }]
######################
python = "^3.9"
pydantic = "^2.0.0"
docling-core = {extras = ["chunking"], version = "^2.22.0"}
docling-core = {extras = ["chunking"], version = "^2.23.1"}
docling-ibm-models = "^3.4.0"
docling-parse = "^3.3.0"
docling-parse = "^4.0.0"
filetype = "^1.2.0"
pypdfium2 = "^4.30.0"
pydantic-settings = "^2.3.0"
@ -88,6 +88,7 @@ accelerate = [
]
pillow = ">=10.0.0,<12.0.0"
tqdm = "^4.65.0"
pluggy = "^1.0.0"
pylatexenc = "^2.10"
[tool.poetry.group.dev.dependencies]
@ -156,6 +157,9 @@ rapidocr = ["rapidocr-onnxruntime", "onnxruntime"]
docling = "docling.cli.main:app"
docling-tools = "docling.cli.tools:app"
[tool.poetry.plugins."docling"]
"docling_defaults" = "docling.models.plugins.defaults"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -140,13 +140,13 @@ tention encoding is then multiplied to the encoded image to produce a feature fo
The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer.
Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets.
Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$\_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$\_{box}$ . l$\_{box}$ consists of the generally used l$\_{1}$ loss for object detection and the IoU loss ( l$\_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets.
The loss used to train the TableFormer can be defined as following:
<!-- formula-not-decoded -->
where λ ∈ [0, 1], and λ$_{iou}$, λ$_{l}$$\_{1}$ ∈$\_{R}$ are hyper-parameters.
where λ ∈ [0, 1], and λ$\_{iou}$, λ$\_{l}$$\_{1}$ ∈$\_{R}$ are hyper-parameters.
## 5. Experimental Results
@ -176,7 +176,7 @@ The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It
<!-- formula-not-decoded -->
where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T .
where T$\_{a}$ and T$\_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T .
## 5.4. Quantitative Analysis
@ -372,7 +372,7 @@ Here is a step-by-step description of the prediction postprocessing:
<!-- formula-not-decoded -->
where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point.
where c is one of { left, centroid, right } and x$\_{c}$ is the xcoordinate for the corresponding point.
- 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-
- 6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes.

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-comma-in-cell",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-comma",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-inconsistent-header",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-pipe",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-semicolon",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-tab",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-too-few-columns",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "csv-too-many-columns",
"origin": {
"mimetype": "text/csv",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.2.0",
"version": "1.3.0",
"name": "equations",
"origin": {
"mimetype": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "example_01",
"origin": {
"mimetype": "text/html",

View File

@ -2,7 +2,7 @@
This is the first paragraph of the introduction.
## Background
### Background
Some background information here.

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "example_02",
"origin": {
"mimetype": "text/html",

View File

@ -2,7 +2,7 @@
This is the first paragraph of the introduction.
## Background
### Background
Some background information here.

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "example_03",
"origin": {
"mimetype": "text/html",

View File

@ -1,10 +1,10 @@
# Example Document
## Introduction
### Introduction
This is the first paragraph of the introduction.
## Background
### Background
Some background information here.
@ -16,9 +16,9 @@ Some background information here.
1. First item in ordered list
1. Nested ordered item 1
2. Nested ordered item 2
3. Second item in ordered list
2. Second item in ordered list
## Data Table
### Data Table
| Header 1 | Header 2 | Header 3 |
|--------------|--------------|--------------|

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "example_04",
"origin": {
"mimetype": "text/html",

View File

@ -1,7 +1,7 @@
# Data Table with Rowspan and Colspan
| Header 1 | Header 2 &amp; 3 (colspan) | Header 2 &amp; 3 (colspan) |
| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) |
|----------------------------|----------------------------|----------------------------|
| Row 1 &amp; 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
| Row 1 &amp; 2, Col 1 (rowspan) | Row 2, Col 2 &amp; 3 (colspan) | Row 2, Col 2 &amp; 3 (colspan) |
| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) |
| Row 3, Col 1 | Row 3, Col 2 | Row 3, Col 3 |

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "example_05",
"origin": {
"mimetype": "text/html",

View File

@ -1,7 +1,7 @@
# Omitted html and body tags
| Header 1 | Header 2 &amp; 3 (colspan) | Header 2 &amp; 3 (colspan) |
| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) |
|----------------------------|----------------------------|----------------------------|
| Row 1 &amp; 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
| Row 1 &amp; 2, Col 1 (rowspan) | Row 2, Col 2 &amp; 3 (colspan) | Row 2, Col 2 &amp; 3 (colspan) |
| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) |
| Row 3, Col 1 | Row 3, Col 2 | Row 3, Col 3 |

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "example_06",
"origin": {
"mimetype": "text/html",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.2.0",
"version": "1.3.0",
"name": "example_07",
"origin": {
"mimetype": "text/html",

View File

@ -1,6 +1,6 @@
{
"schema_name": "DoclingDocument",
"version": "1.1.0",
"version": "1.3.0",
"name": "ipa20180000016.xml",
"origin": {
"mimetype": "application/xml",

View File

@ -1,20 +1,20 @@
# LIGHT EMITTING DEVICE AND PLANT CULTIVATION METHOD
## ABSTRACT
### ABSTRACT
Provided is a light emitting device that includes a light emitting element having a light emission peak wavelength ranging from 380 nm to 490 nm, and a fluorescent material excited by light from the light emitting element and emitting light having at a light emission peak wavelength ranging from 580 nm or more to less than 680 nm. The light emitting device emits light having a ratio R/B of a photon flux density R to a photon flux density B ranging from 2.0 to 4.0 and a ratio R/FR of the photon flux density R to a photon flux density FR ranging from 0.7 to 13.0, the photon flux density R being in a wavelength range of 620 nm or more and less than 700 nm, the photon flux density B being in a wavelength range of 380 nm or more and 490 nm or less, and the photon flux density FR being in a wavelength range of 700 nm or more and 780 nm or less.
## CROSS-REFERENCE TO RELATED APPLICATION
### CROSS-REFERENCE TO RELATED APPLICATION
The application claims benefit of Japanese Patent Application No. 2016-128835 filed on Jun. 29, 2016, the entire disclosure of which is hereby incorporated by reference in its entirety.
## BACKGROUND
### BACKGROUND
## Technical Field
### Technical Field
The present disclosure relates to a light emitting device and a plant cultivation method.
## Description of Related Art
### Description of Related Art
With environmental changes due to climate change and other artificial disruptions, plant factories are expected to increase production efficiency of vegetables and be capable of adjusting production in order to make it possible to stably supply vegetables. Plant factories that are capable of artificial management can stably supply clean and safe vegetables to markets, and therefore are expected to be the next-generation industries.
@ -26,7 +26,7 @@ In plant factories, the light source used in place of sunlight affect a growth p
For example, Japanese Unexamined Patent Publication No. 2009-125007 discloses a plant growth method. In this method, the plants is irradiated with light emitted from a first LED light emitting element and/or a second LED light emitting element at predetermined timings using a lighting apparatus including the first LED light emitting element emitting light having a wavelength region of 625 to 690 nm and the second LED light emitting element emitting light having a wavelength region of 420 to 490 nm in order to emit lights having sufficient intensities and different wavelengths from each other.
## SUMMARY
### SUMMARY
However, even though plants are merely irradiated with lights having different wavelengths as in the plant growth method disclosed in Japanese Unexamined Patent Publication No. 2009-125007, the effect of promoting plant growth is not sufficient. Further improvement is required in promotion of plant growth.
@ -40,7 +40,7 @@ A second embodiment of the present disclosure is a plant cultivation method incl
According to embodiments of the present disclosure, a light emitting device capable of promoting growth of plants and a plant cultivation method can be provided.
## BRIEF DESCRIPTION OF THE DRAWINGS
### BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic cross sectional view of a light emitting device according to an embodiment of the present disclosure.
@ -50,11 +50,11 @@ FIG. 3 is a graph showing fresh weight (edible part) at the harvest time of each
FIG. 4 is a graph showing nitrate nitrogen content in each plant grown by irradiating the plant with light from exemplary light emitting devices according to embodiments of the present disclosure and a comparative light emitting device.
## DETAILED DESCRIPTION
### DETAILED DESCRIPTION
A light emitting device and a plant cultivation method according to the present invention will be described below based on an embodiment. However, the embodiment described below only exemplifies the technical concept of the present invention, and the present invention is not limited to the light emitting device and plant cultivation method described below. In the present specification, the relationship between the color name and the chromaticity coordinate, the relationship between the wavelength range of light and the color name of monochromatic light follows JIS Z8110.
### Light Emitting Device
#### Light Emitting Device
An embodiment of the present disclosure is a light emitting device including a light emitting element having a light emission peak wavelength in a range of 380 nm or more and 490 nm or less (hereinafter sometimes referred to as a “region of from near ultraviolet to blue color”), and a first fluorescent material emitting light having at least one light emission peak wavelength in a range of 580 nm or more and less than 680 nm by being excited by light from the light emitting element. The light emitting device emits light having a ratio R/B of a photon flux density R to a photon flux density B within a range of 2.0 or more and 4.0 or less, and a ratio R/FR of the photon flux density R to a photon flux density FR within a range of 0.7 or more and 13.0 or less, where the photon flux density R is the number of light quanta (μmol·m⁻²·g⁻¹) incident per unit time and unit area in a wavelength range of 620 nm or more and less than 700 nm, the photon flux density B is the number of light quanta (μmol·m⁻²·g⁻¹) incident per unit time and unit area in a wavelength range of 380 nm or more and 490 nm or less, and the photon flux density FR is the number of light quanta (μmol·m⁻²·g⁻¹) incident per unit time and unit area in a wavelength range of 700 nm or more and 780 nm or less.
@ -84,7 +84,7 @@ For the above reasons, nitrogen is one of nutrients necessary for growth of plan
It is preferred that the light emitting device 100 further include the second fluorescent material 72 having at least one light emission peak wavelength in a range of 680 nm or more and 800 nm or less by being excited by light from the light emitting element 10, wherein the R/FR ratio is within a range of 0.7 or more and 5.0 or less. The R/FR ratio is more preferably within a range of 0.7 or more and 2.0 or less.
### Light Emitting Element
#### Light Emitting Element
The light emitting element 10 is used as an excitation light source, and is a light emitting element emitting light having a light emission peak wavelength in a range of 380 nm or more and 490 nm or less. As a result, a stable light emitting device having high efficiency, high linearity of output to input and strong mechanical impacts can be obtained.
@ -92,7 +92,7 @@ The range of the light emission peak wavelength of the light emitting element 10
The half value width of emission spectrum of the light emitting element 10 can be, for example, 30 nm or less.
### Fluorescent Member
#### Fluorescent Member
The fluorescent member 50 used in the light emitting device 100 preferably includes the first fluorescent material 71 and a sealing material, and more preferably further includes the second fluorescent material 72. A thermoplastic resin and a thermosetting resin can be used as the sealing material. The fluorescent member 50 may contain other components such as a filler, a light stabilizer and a colorant, in addition to the fluorescent material and the sealing material. Examples of the filler include silica, barium titanate, titanium oxide and aluminum oxide.
@ -100,7 +100,7 @@ The content of other components other than the fluorescent material 70 and the s
The total content of the fluorescent material 70 in the fluorescent member 50 can be, for example, 5 parts by mass or more and 300 parts by mass or less, per 100 parts by mass of the sealing material. The total content is preferably 10 parts by mass or more and 250 parts by mass or less, more preferably 15 parts by mass or more and 230 parts by mass or less, and still more preferably 15 parts by mass or more and 200 parts by mass or less. When the total content of the fluorescent material 70 in the fluorescent member 50 is within the above range, the light emitted from the light emitting element 10 can be efficiently subjected to wavelength conversion in the fluorescent material 70.
### First Fluorescent Material
#### First Fluorescent Material
The first fluorescent material 71 is a fluorescent material that is excited by light from the light emitting element 10 and emits light having at least one light emission peak wavelength in a range of 580 nm or more and less than 680 nm. Examples of the first fluorescent material 71 include an Mn⁴⁺-activated fluorogermanate fluorescent material, an Eu²⁺-activated nitride fluorescent material, an Eu²⁺-activated alkaline earth sulfide fluorescent material and an Mn⁴⁺-activated halide fluorescent material. The first fluorescent material 71 may use one selected from those fluorescent materials and may use a combination of two or more thereof. The first fluorescent material preferably contains an Eu²⁺-activated nitride fluorescent material and an Mn⁴⁺-activated fluorogermanate fluorescent material.
@ -138,7 +138,7 @@ The first fluorescent material 71 preferably contains at least two fluorescent m
In the case where the first fluorescent material 71 contains at least two fluorescent materials and two fluorescent materials are a MGF fluorescent material and a CASN fluorescent material, where a compounding ratio thereof (MGF fluorescent material:CASN fluorescent material) is preferably in a range of 50:50 or more and 99:1 or less, more preferably in a range of 60:40 or more and 97:3 or less, and still more preferably in a range of 70:30 or more and 96:4 or less, in mass ratio. In the case where the first fluorescent material contains two fluorescent materials, when those fluorescent materials are a MGF fluorescent material and a CASN fluorescent material and the mass ratio thereof is within the aforementioned range, the light emitted from the light emitting element 10 can be efficiently subjected to wavelength conversion in the first fluorescent material 71. In addition, the R/B ratio can be adjusted to within a range of 2.0 or more and 4.0 or less, and the R/FR ratio is easy to be adjusted to within a range of 0.7 or more and 13.0 or less.
### Second Fluorescent Material
#### Second Fluorescent Material
The second fluorescent material 72 is a fluorescent material that is excited by the light from the light emitting element 10 and emits light having at least one light emission peak wavelength in a range of 680 nm or more and 800 nm or less.
@ -160,25 +160,25 @@ In the second fluorescent material 72, the value of the parameter y is preferabl
The parameter x is an activation amount of Ce and the value of the parameter x is in a range of exceeding 0.0002 and less than 0.50 (0.0002&lt;x&lt;0.50), and the parameter y is an activation amount of Cr. When the value of the parameter y is in a range of exceeding 0.0001 and less than 0.05 (0.0001&lt;y&lt;0.05), the activation amount of Ce and the activation amount of Cr that are light emission centers contained in the crystal structure of the fluorescent material are within optimum ranges, the decrease of light emission intensity due to the decrease of light emission center can be suppressed, the decrease of light emission intensity due to concentration quenching caused by the increase of the activation amount can be suppressed, and light emission intensity can be enhanced.
### Production Method of Second Fluorescent Material
#### Production Method of Second Fluorescent Material
A method for producing the second fluorescent material 72 includes the following method.
A compound containing at least one rare earth element Ln selected from the group consisting of rare earth elements excluding Ce, a compound containing at least one element M selected from the group consisting of Al, Ga, and In, a compound containing Ce and a compound containing Cr are mixed such that, when the total molar composition ratio of the M is taken as 5 as the standard, in the case where the total molar composition ratio of Ln, Ce, and Nd is 3, the molar ratio of Ce is a product of 3 and a value of a parameter x, and the molar ratio of Cr is a product of 3 and a value of a parameter y, the value of the parameter x is in a range of exceeding 0.0002 and less than 0.50 and the value of the parameter y is in a range of exceeding 0.0001 and less than 0.05, thereby obtaining a raw material mixture, the raw material mixture is heat-treated, followed by classification and the like, thereby obtaining the second fluorescent material.
### Compound Containing Rare Earth Element Ln
#### Compound Containing Rare Earth Element Ln
Examples of the compound containing rare earth element Ln include oxides, hydroxides, nitrides, oxynitrides, fluorides, and chlorides, that contain at least one rare earth element Ln selected from the group consisting of rare earth elements excluding Ce. Those compounds may be hydrates. At least a part of the compounds containing rare earth element may use a metal simple substance or an alloy containing rare earth element. The compound containing rare earth element is preferably a compound containing at least one rare earth element Ln selected from the group consisting of Y, Gd, Lu, La, Tb, and Pr. The compound containing rare earth element may be used alone or may be used as a combination of at least two compounds containing rare earth element.
The compound containing rare earth element is preferably an oxide that does not contain elements other than the target composition, as compared with other materials. Examples of the oxide specifically include Y₂O₃, Gd₂O₃, Lu₂O₃, La₂O₃, Tb₄O₇ and Pr₆O₁₁.
### Compound Containing M
#### Compound Containing M
Examples of the compound containing at least one element M selected from the group consisting of Al, Ga, and In include oxides, hydroxides, nitrides, oxynitrides, fluorides, and chlorides, that contain Al, Ga, or In. Those compounds may be hydrates. Furthermore, Al metal simple substance, Ga metal simple substance, In metal simple substance, Al alloy, Ga alloy or In alloy may be used, and metal simple substance or an alloy may be used in place of at least a part of the compound. The compound containing Al, Ga, or In may be used alone or may be used as a combination of two or more thereof. The compound containing at least one element selected from the group consisting of Al, Ga, and In is preferably an oxide. The reason for this is that an oxide that does not contain elements other than the target composition, as compared with other materials, and a fluorescent material having a target composition are easy to be obtained. When a compound containing elements other than the target composition has been used, residual impurity elements are sometimes present in the fluorescent material obtained. The residual impurity element becomes a killer factor in light emission, leading to the possibility of remarkable decrease of light emission intensity.
Examples of the compound containing Al, Ga, or In specifically include Al₂O₃, Ga₂O₃, and In₂O₃.
### Compound Containing Ce and Compound Containing Cr
#### Compound Containing Ce and Compound Containing Cr
Examples of the compound containing Ce or the compound containing Cr include oxides, hydroxides, nitrides, fluorides, and chlorides, that contain cerium (Ce) or chromium (Cr). Those compounds may be hydrates. Ce metal simple substance, Ce alloy, Cr metal simple substance, or Cr alloy may be used, and a metal simple substance or an alloy may be used in place of a part of the compound. The compound containing Ce or the compound containing Cr may be used alone or may be used as a combination of two or more thereof. The compound containing Ce or the compound containing Cr is preferably an oxide. The reason for this is that an oxide that does not contain elements other than the target composition, as compared with other materials, and a fluorescent material having a target composition are easy to be obtained. When a compound containing elements other than the target composition has been used, residual impurity elements are sometimes present in the fluorescent material obtained. The residual impurity element becomes a killer factor in light emission, leading to the possibility of remarkable decrease of light emission intensity.
@ -204,7 +204,7 @@ The atmosphere for heat-treating the raw material mixture is an inert atmosphere
The fluorescent material obtained may be subjected to post-treatment steps such as a solid-liquid separation by a method such as cleaning or filtration, drying by a method such as vacuum drying, and classification by dry sieving. After those post-treatment steps, a fluorescent material having a desired average particle diameter is obtained.
### Other Fluorescent Materials
#### Other Fluorescent Materials
The light emitting device 100 may contain other kinds of fluorescent materials, in addition to the first fluorescent material 71.
@ -250,21 +250,21 @@ GdAlO₃:Cr (x)
The light emitting device 100 can be utilized as a light emitting device for plant cultivation that can activate photosynthesis of plants and promote growth of plants so as to have favorable form and weight.
### Plant Cultivation Method
#### Plant Cultivation Method
The plant cultivation method of one embodiment of the present disclosure is a method for cultivating plants, including irradiating plants with light emitted from the light emitting device 100. In the plant cultivation method, plants can be irradiated with light from the light emitting device 100 in plant factories that are completely isolated from external environment and make it possible for artificial control. The kind of plants is not particularly limited. However, the light emitting device 100 of one embodiment of the present disclosure can activate photosynthesis of plants and promote growth of plants such that a stem, a leaf, a root, a fruit have favorable form and weight, and therefore is preferably applied to cultivation of vegetables, flowers that contain much chlorophyll performing photosynthesis. Examples of the vegetables include lettuces such as garden lettuce, curl lettuce, Lamb's lettuce, Romaine lettuce, endive, Lollo Rosso, Rucola lettuce, and frill lettuce; Asteraceae vegetables such as “shungiku” (chrysanthemum coronarium); morning glory vegetables such as spinach; Rosaceae vegetables such as strawberry; and flowers such as chrysanthemum, gerbera, rose, and tulip.
## EXAMPLES
### EXAMPLES
The present invention is further specifically described below by Examples and Comparative Examples.
## Examples 1 to 5
### Examples 1 to 5
### First Fluorescent Material
#### First Fluorescent Material
Two fluorescent materials of fluorogarmanate fluorescent material that is activated by Mn⁴⁺, having a light emission peak at 660 nm and fluorescent material containing silicon nitride that are activated by Eu²⁺, having a light emission peak at 660 nm were used as the first fluorescent material 71. In the first fluorescent material 71, a mass ratio of a MGF fluorescent material to a CASN fluorescent material (MGF:CASN) was 95:5.
### Second Fluorescent Material
#### Second Fluorescent Material
Fluorescent material that is obtained by the following production method was used as the second fluorescent material 72.
@ -272,7 +272,7 @@ Fluorescent material that is obtained by the following production method was use
The raw material mixture obtained was placed in an alumina crucible, and a lid was put on the alumina crucible. The raw material mixture was heat-treated at 1,500° C. for 10 hours in a reducing atmosphere of H₂: 3 vol % and N₂: 97 vol %. Thus, a calcined product was obtained. The calcined product was passed through a dry sieve to obtain a second fluorescent material. The second fluorescent material obtained was subjected to composition analysis by ICP-AES emission spectrometry using an inductively coupled plasma emission analyzer (manufactured by Perkin Elmer). The composition of the second fluorescent material obtained was (Y₀.₉₇₇Ce₀.₀₀₉Cr₀.₀₁₄)₃Al₅O₁₂ (hereinafter referred to as “YAG: Ce, Cr”).
### Light Emitting Device
#### Light Emitting Device
Nitride semiconductor having a light emission peak wavelength of 450 nm was used as the light emitting element 10 in the light emitting device 100.
@ -280,17 +280,17 @@ Silicone resin was used as a sealing material constituting the fluorescent membe
The resin composition was poured on the light emitting element 10 of a depressed portion of the molded article 40 to fill the depressed portion, and heated at 150° C. for 4 hours to cure the resin composition, thereby forming the fluorescent member 50. Thus, the light emitting device 100 as shown in FIG. 1 was produced in each of Examples 1 to 5.
## Comparative Example 1
### Comparative Example 1
A light emitting device X including a semiconductor light emitting element having a light emission peak wavelength of 450 nm and a light emitting device Y including a semiconductor light emitting element having a light emission peak length of 660 nm were used, and the R/B ratio was adjusted to 2.5.
### Evaluation
#### Evaluation
### Photon Flux Density
#### Photon Flux Density
Photon flux densities of lights emitted from the light emitting device 100 used in Examples 1 to 5 and the light emitting devices X and Y used in Comparative Example 1 were measured using a photon measuring device (LI-250A, manufactured by Li-COR). The photon flux density B, the photon flux density R, and the photon flux density FR of lights emitted from the light emitting devices used in each of the Examples and Comparative Example; the R/B ratio; and the R/FR ratio are shown in Table 1. FIG. 2 shows spectra showing the relationship between a wavelength and a relative photon flux density, in the light emitting devices used in each Example and Comparative Example.
### Plant Cultivation Test
#### Plant Cultivation Test
The plant cultivation method includes a method of conducting by “growth period by RGB light source (hereinafter referred to as a first growth period)” and “growth period by light source for plant growth (hereinafter referred to as a second growth period)” using a light emitting device according to an embodiment of the present disclosure as a light source.
@ -306,21 +306,21 @@ The cultivation test was specifically conducted by the following method.
Romaine lettuce (green romaine, produced by Nakahara Seed Co., Ltd.) was used as cultivation plant.
### First Growth Period
#### First Growth Period
Urethane sponges (salad urethane, manufactured by M Hydroponic Research Co., Ltd.) having Romaine lettuce seeded therein were placed side by side on a plastic tray, and were irradiated with light from RGB-LED light source (manufactured by Shibasaki Inc.) to cultivate plants. The plants were cultivated for 16 days under the conditions of room temperature: 22 to 23° C., humidity: 50 to 60%, photon flux density from light emitting device: 100 μmol·m⁻²·s⁻¹ and daytime hour: 16 hours/day. Only water was given until germination, and after the germination (about 4 days later), a solution obtained by mixing Otsuka House #1 (manufactured by Otsuka Chemical Co., Ltd.) and Otsuka House #2 (manufactured by Otsuka Chemical Co., Ltd.) in a mass ratio of 3:2 and dissolving the mixture in water was used as a nutrient solution (Otsuka Formulation A). Conductivity of the nutrient was 1.5 ms·cm⁻¹.
### Second Growth Period
#### Second Growth Period
After the first growth period, the plants were irradiated with light from the light emitting devices of Examples 1 to 5 and Comparative Example 1, and were subjected to hydroponics.
The plants were cultivated for 19 days under the conditions of room temperature: 22 to 24° C., humidity: 60 to 70%, CO₂ concentration: 600 to 700 ppm, photon flux density from light emitting device: 125 μmol·m⁻²·s⁻¹ and daytime hour: 16 hours/day. Otsuka Formulation A was used as the nutrient solution. Conductivity of the nutrient was 1.5 ms·cm⁻¹. The values of the R/B and R/FR ratios of light for plant irradiation from each light emitting device in the second growth period are shown in Table 1.
### Measurement of Fresh Weight (Edible Part)
#### Measurement of Fresh Weight (Edible Part)
The plants after cultivation were harvested, and wet weights of a terrestrial part and a root were measured. The wet weight of a terrestrial part of each of 6 cultivated plants having been subjected to hydroponics by irradiating with light from the light emitting devices of Examples 1 to 5 and Comparative Example 1 was measured as a fresh weight (edible part) (g). The results obtained are shown in Table 1 and FIG. 3.
### Measurement of Nitrate Nitrogen Content
#### Measurement of Nitrate Nitrogen Content
The edible part (about 20 g) of each of the cultivated plants, from which a foot about 5 cm had been removed, was frozen with liquid nitrogen and crushed with a juice mixer (laboratory mixer LM-PLUS, manufactured by Osaka Chemical Co., Ltd.) for 1 minute. The resulting liquid was filtered with Miracloth (manufactured by Milipore), and the filtrate was centrifuged at 4° C. and 15,000 rpm for 5 minutes. The nitrate nitrogen content (mg/100 g) in the cultivated plant in the supernatant was measured using a portable reflection photometer system (product name: RQ flex system, manufactured by Merck) and a test paper (product name: Reflectoquant (registered trade mark), manufactured by Kanto Chemical Co., Inc.). The results are shown in Table 1 and FIG. 4.
@ -355,7 +355,7 @@ In addition, in the foregoing Detailed Description, various features may be grou
The above disclosed subject matter shall be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments which fall within the true spirit and scope of the present disclosure. Thus, to the maximum extent allowed by law, the scope of the present disclosure may be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
## CLAIMS
### CLAIMS
1. A light emitting device comprising: a light emitting element having a light emission peak wavelength in a range of 380 nm or more and 490 nm or less; and a fluorescent material that is excited by light from the light emitting element and emits light having at least one light emission peak wavelength in a range of 580 nm or more and less than 680 nm, wherein the light emitting device emits light having a ratio R/B of a photon flux density R to a photon flux density B within a range of 2.0 or more and 4.0 or less, and a ratio R/FR of the photon flux density R to a photon flux density FR within a range of 0.7 or more and 13.0 or less, wherein the photon flux density R is in a wavelength range of 620 nm or more and less than 700 nm, the photon flux density B is in a wavelength range of 380 nm or more and 490 nm or less, and the photon flux density FR is in a wavelength range of 700 nm or more and 780 nm or less.

Some files were not shown because too many files have changed in this diff Show More