mirror of
https://github.com/DS4SD/docling.git
synced 2025-07-30 14:04:27 +00:00
Merge branch 'docling-project:main' into main
This commit is contained in:
commit
34c3a395fe
11
.actor/.dockerignore
Normal file
11
.actor/.dockerignore
Normal file
@ -0,0 +1,11 @@
|
||||
**/__pycache__
|
||||
**/*.pyc
|
||||
**/*.pyo
|
||||
**/*.pyd
|
||||
.git
|
||||
.gitignore
|
||||
.env
|
||||
.venv
|
||||
*.log
|
||||
.pytest_cache
|
||||
.coverage
|
69
.actor/CHANGELOG.md
Normal file
69
.actor/CHANGELOG.md
Normal file
@ -0,0 +1,69 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to the Docling Actor will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [1.1.0] - 2025-03-09
|
||||
|
||||
### Changed
|
||||
|
||||
- Switched from full Docling CLI to docling-serve API
|
||||
- Using the official quay.io/ds4sd/docling-serve-cpu Docker image
|
||||
- Reduced Docker image size (from ~6GB to ~4GB)
|
||||
- Implemented multi-stage Docker build to handle dependencies
|
||||
- Improved Docker build process to ensure compatibility with docling-serve-cpu image
|
||||
- Added new Python processor script for reliable API communication and content extraction
|
||||
- Enhanced response handling with better content extraction logic
|
||||
- Fixed ES modules compatibility issue with Apify CLI
|
||||
- Added explicit tmpfs volume for temporary files
|
||||
- Fixed environment variables format in actor.json
|
||||
- Created optimized dependency installation approach
|
||||
- Improved API compatibility with docling-serve
|
||||
- Updated endpoint from custom `/convert` to standard `/v1alpha/convert/source`
|
||||
- Revised JSON payload structure to match docling-serve API format
|
||||
- Added proper output field parsing based on format
|
||||
- Enhanced startup process with health checks
|
||||
- Added configurable API host and port through environment variables
|
||||
- Better content type handling for different output formats
|
||||
- Updated error handling to align with API responses
|
||||
|
||||
### Fixed
|
||||
|
||||
- Fixed actor input file conflict in get_actor_input(): now checks for and removes an existing /tmp/actor-input/INPUT directory if found, ensuring valid JSON input parsing.
|
||||
|
||||
### Technical Details
|
||||
|
||||
- Actor Specification v1
|
||||
- Using quay.io/ds4sd/docling-serve-cpu:latest base image
|
||||
- Node.js 20.x for Apify CLI
|
||||
- Eliminated Python dependencies
|
||||
- Simplified Docker build process
|
||||
|
||||
## [1.0.0] - 2025-02-07
|
||||
|
||||
### Added
|
||||
|
||||
- Initial release of Docling Actor
|
||||
- Support for multiple document formats (PDF, DOCX, images)
|
||||
- OCR capabilities for scanned documents
|
||||
- Multiple output formats (md, json, html, text, doctags)
|
||||
- Comprehensive error handling and logging
|
||||
- Dataset records with processing status
|
||||
- Memory monitoring and resource optimization
|
||||
- Security features including non-root user execution
|
||||
|
||||
### Technical Details
|
||||
|
||||
- Actor Specification v1
|
||||
- Docling v2.17.0
|
||||
- Python 3.11
|
||||
- Node.js 20.x
|
||||
- Comprehensive error codes:
|
||||
- 10: Invalid input
|
||||
- 11: URL inaccessible
|
||||
- 12: Docling processing failed
|
||||
- 13: Output file missing
|
||||
- 14: Storage operation failed
|
||||
- 15: OCR processing failed
|
87
.actor/Dockerfile
Normal file
87
.actor/Dockerfile
Normal file
@ -0,0 +1,87 @@
|
||||
# Build stage for installing dependencies
|
||||
FROM node:20-slim AS builder
|
||||
|
||||
# Install necessary tools and prepare dependencies environment in one layer
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
&& rm -rf /var/lib/apt/lists/* \
|
||||
&& mkdir -p /build/bin /build/lib/node_modules \
|
||||
&& cp /usr/local/bin/node /build/bin/
|
||||
# Set working directory
|
||||
WORKDIR /build
|
||||
|
||||
# Create package.json and install Apify CLI in one layer
|
||||
RUN echo '{"name":"docling-actor-dependencies","version":"1.0.0","description":"Dependencies for Docling Actor","private":true,"type":"module","engines":{"node":">=18"}}' > package.json \
|
||||
&& npm install apify-cli@latest \
|
||||
&& cp -r node_modules/* lib/node_modules/ \
|
||||
&& echo '#!/bin/sh\n/tmp/docling-tools/bin/node /tmp/docling-tools/lib/node_modules/apify-cli/bin/run "$@"' > bin/actor \
|
||||
&& chmod +x bin/actor \
|
||||
# Clean up npm cache to reduce image size
|
||||
&& npm cache clean --force
|
||||
|
||||
# Final stage with docling-serve-cpu
|
||||
FROM quay.io/ds4sd/docling-serve-cpu:latest
|
||||
|
||||
LABEL maintainer="Vaclav Vancura <@vancura>" \
|
||||
description="Apify Actor for document processing using Docling" \
|
||||
version="1.1.0"
|
||||
|
||||
# Set only essential environment variables
|
||||
ENV PYTHONUNBUFFERED=1 \
|
||||
PYTHONDONTWRITEBYTECODE=1 \
|
||||
DOCLING_SERVE_HOST=0.0.0.0 \
|
||||
DOCLING_SERVE_PORT=5001
|
||||
|
||||
# Switch to root temporarily to set up directories and permissions
|
||||
USER root
|
||||
WORKDIR /app
|
||||
|
||||
# Install required tools and create directories in a single layer
|
||||
RUN dnf install -y \
|
||||
jq \
|
||||
&& dnf clean all \
|
||||
&& mkdir -p /build-files \
|
||||
/tmp \
|
||||
/tmp/actor-input \
|
||||
/tmp/actor-output \
|
||||
/tmp/actor-storage \
|
||||
/tmp/apify_input \
|
||||
/apify_input \
|
||||
/opt/app-root/src/.EasyOCR/user_network \
|
||||
/tmp/easyocr-models \
|
||||
&& chown 1000:1000 /build-files \
|
||||
&& chown -R 1000:1000 /opt/app-root/src/.EasyOCR \
|
||||
&& chmod 1777 /tmp \
|
||||
&& chmod 1777 /tmp/easyocr-models \
|
||||
&& chmod 777 /tmp/actor-input /tmp/actor-output /tmp/actor-storage /tmp/apify_input /apify_input \
|
||||
# Fix for uv_os_get_passwd error in Node.js
|
||||
&& echo "docling:x:1000:1000:Docling User:/app:/bin/sh" >> /etc/passwd
|
||||
|
||||
# Set environment variable to tell EasyOCR to use a writable location for models
|
||||
ENV EASYOCR_MODULE_PATH=/tmp/easyocr-models
|
||||
|
||||
# Copy only required files
|
||||
COPY --chown=1000:1000 .actor/actor.sh .actor/actor.sh
|
||||
COPY --chown=1000:1000 .actor/actor.json .actor/actor.json
|
||||
COPY --chown=1000:1000 .actor/input_schema.json .actor/input_schema.json
|
||||
COPY --chown=1000:1000 .actor/docling_processor.py .actor/docling_processor.py
|
||||
RUN chmod +x .actor/actor.sh
|
||||
|
||||
# Copy the build files from builder
|
||||
COPY --from=builder --chown=1000:1000 /build /build-files
|
||||
|
||||
|
||||
# Switch to non-root user
|
||||
USER 1000
|
||||
|
||||
# Set up TMPFS for temporary files
|
||||
VOLUME ["/tmp"]
|
||||
|
||||
# Create additional volumes for OCR models persistence
|
||||
VOLUME ["/tmp/easyocr-models"]
|
||||
|
||||
# Expose the docling-serve API port
|
||||
EXPOSE 5001
|
||||
|
||||
# Run the actor script
|
||||
ENTRYPOINT [".actor/actor.sh"]
|
314
.actor/README.md
Normal file
314
.actor/README.md
Normal file
@ -0,0 +1,314 @@
|
||||
# Docling Actor on Apify
|
||||
|
||||
[](https://apify.com/vancura/docling)
|
||||
|
||||
This Actor (specification v1) wraps the [Docling project](https://ds4sd.github.io/docling/) to provide serverless document processing in the cloud. It can process complex documents (PDF, DOCX, images) and convert them into structured formats (Markdown, JSON, HTML, Text, or DocTags) with optional OCR support.
|
||||
|
||||
## What are Actors?
|
||||
|
||||
[Actors](https://docs.apify.com/platform/actors?fpr=docling) are serverless microservices running on the [Apify Platform](https://apify.com/?fpr=docling). They are based on the [Actor SDK](https://docs.apify.com/sdk/js?fpr=docling) and can be found in the [Apify Store](https://apify.com/store?fpr=docling). Learn more about Actors in the [Apify Whitepaper](https://whitepaper.actor?fpr=docling).
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Features](#features)
|
||||
2. [Usage](#usage)
|
||||
3. [Input Parameters](#input-parameters)
|
||||
4. [Output](#output)
|
||||
5. [Performance & Resources](#performance--resources)
|
||||
6. [Troubleshooting](#troubleshooting)
|
||||
7. [Local Development](#local-development)
|
||||
8. [Architecture](#architecture)
|
||||
9. [License](#license)
|
||||
10. [Acknowledgments](#acknowledgments)
|
||||
11. [Security Considerations](#security-considerations)
|
||||
|
||||
## Features
|
||||
|
||||
- Leverages the official docling-serve-cpu Docker image for efficient document processing
|
||||
- Processes multiple document formats:
|
||||
- PDF documents (scanned or digital)
|
||||
- Microsoft Office files (DOCX, XLSX, PPTX)
|
||||
- Images (PNG, JPG, TIFF)
|
||||
- Other text-based formats
|
||||
- Provides OCR capabilities for scanned documents
|
||||
- Exports to multiple formats:
|
||||
- Markdown
|
||||
- JSON
|
||||
- HTML
|
||||
- Plain Text
|
||||
- DocTags (structured format)
|
||||
- No local setup needed—just provide input via a simple JSON config
|
||||
|
||||
## Usage
|
||||
|
||||
### Using Apify Console
|
||||
|
||||
1. Go to the Apify Actor page.
|
||||
2. Click "Run".
|
||||
3. In the input form, fill in:
|
||||
- The URL of the document.
|
||||
- Output format (`md`, `json`, `html`, `text`, or `doctags`).
|
||||
- OCR boolean toggle.
|
||||
4. The Actor will run and produce its outputs in the default key-value store under the key `OUTPUT`.
|
||||
|
||||
### Using Apify API
|
||||
|
||||
```bash
|
||||
curl --request POST \
|
||||
--url "https://api.apify.com/v2/acts/vancura~docling/run" \
|
||||
--header 'Content-Type: application/json' \
|
||||
--header 'Authorization: Bearer YOUR_API_TOKEN' \
|
||||
--data '{
|
||||
"options": {
|
||||
"to_formats": ["md", "json", "html", "text", "doctags"]
|
||||
},
|
||||
"http_sources": [
|
||||
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
|
||||
{"url": "https://arxiv.org/pdf/2408.09869"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
### Using Apify CLI
|
||||
|
||||
```bash
|
||||
apify call vancura/docling --input='{
|
||||
"options": {
|
||||
"to_formats": ["md", "json", "html", "text", "doctags"]
|
||||
},
|
||||
"http_sources": [
|
||||
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
|
||||
{"url": "https://arxiv.org/pdf/2408.09869"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
## Input Parameters
|
||||
|
||||
The Actor accepts a JSON schema matching the file `.actor/input_schema.json`. Below is a summary of the fields:
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|----------------|---------|----------|----------|-------------------------------------------------------------------------------|
|
||||
| `http_sources` | object | Yes | None | https://github.com/DS4SD/docling-serve?tab=readme-ov-file#url-endpoint |
|
||||
| `options` | object | No | None | https://github.com/DS4SD/docling-serve?tab=readme-ov-file#common-parameters |
|
||||
|
||||
### Example Input
|
||||
|
||||
```json
|
||||
{
|
||||
"options": {
|
||||
"to_formats": ["md", "json", "html", "text", "doctags"]
|
||||
},
|
||||
"http_sources": [
|
||||
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
|
||||
{"url": "https://arxiv.org/pdf/2408.09869"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
The Actor provides three types of outputs:
|
||||
|
||||
1. **Processed Documents in a ZIP** - The Actor will provide the direct URL to your result in the run log, looking like:
|
||||
|
||||
```text
|
||||
You can find your results at: 'https://api.apify.com/v2/key-value-stores/[YOUR_STORE_ID]/records/OUTPUT'
|
||||
```
|
||||
|
||||
2. **Processing Log** - Available in the key-value store as `DOCLING_LOG`
|
||||
|
||||
3. **Dataset Record** - Contains processing metadata with:
|
||||
- Direct link to the processed output zip file
|
||||
- Processing status
|
||||
|
||||
You can access the results in several ways:
|
||||
|
||||
1. **Direct URL** (shown in Actor run logs):
|
||||
|
||||
```text
|
||||
https://api.apify.com/v2/key-value-stores/[STORE_ID]/records/OUTPUT
|
||||
```
|
||||
|
||||
2. **Programmatically** via Apify CLI:
|
||||
|
||||
```bash
|
||||
apify key-value-stores get-value OUTPUT
|
||||
```
|
||||
|
||||
3. **Dataset** - Check the "Dataset" tab in the Actor run details to see processing metadata
|
||||
|
||||
### Example Outputs
|
||||
|
||||
#### Markdown (md)
|
||||
|
||||
```markdown
|
||||
# Document Title
|
||||
|
||||
## Section 1
|
||||
Content of section 1...
|
||||
|
||||
## Section 2
|
||||
Content of section 2...
|
||||
```
|
||||
|
||||
#### JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"title": "Document Title",
|
||||
"sections": [
|
||||
{
|
||||
"level": 1,
|
||||
"title": "Section 1",
|
||||
"content": "Content of section 1..."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### HTML
|
||||
|
||||
```html
|
||||
<h1>Document Title</h1>
|
||||
<h2>Section 1</h2>
|
||||
<p>Content of section 1...</p>
|
||||
```
|
||||
|
||||
### Processing Logs (`DOCLING_LOG`)
|
||||
|
||||
The Actor maintains detailed processing logs including:
|
||||
|
||||
- API request and response details
|
||||
- Processing steps and timing
|
||||
- Error messages and stack traces
|
||||
- Input validation results
|
||||
|
||||
Access logs via:
|
||||
|
||||
```bash
|
||||
apify key-value-stores get-record DOCLING_LOG
|
||||
```
|
||||
|
||||
## Performance & Resources
|
||||
|
||||
- **Docker Image Size**: ~4GB
|
||||
- **Memory Requirements**:
|
||||
- Minimum: 2 GB RAM
|
||||
- Recommended: 4 GB RAM for large or complex documents
|
||||
- **Processing Time**:
|
||||
- Simple documents: 15-30 seconds
|
||||
- Complex PDFs with OCR: 1-3 minutes
|
||||
- Large documents (100+ pages): 3-10 minutes
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Common issues and solutions:
|
||||
|
||||
1. **Document URL Not Accessible**
|
||||
- Ensure the URL is publicly accessible
|
||||
- Check if the document requires authentication
|
||||
- Verify the URL leads directly to the document
|
||||
|
||||
2. **OCR Processing Fails**
|
||||
- Verify the document is not password-protected
|
||||
- Check if the image quality is sufficient
|
||||
- Try processing with OCR disabled
|
||||
|
||||
3. **API Response Issues**
|
||||
- Check the logs for detailed error messages
|
||||
- Ensure the document format is supported
|
||||
- Verify the URL is correctly formatted
|
||||
|
||||
4. **Output Format Issues**
|
||||
- Verify the output format is supported
|
||||
- Check if the document structure is compatible
|
||||
- Review the `DOCLING_LOG` for specific errors
|
||||
|
||||
### Error Handling
|
||||
|
||||
The Actor implements comprehensive error handling:
|
||||
|
||||
- Detailed error messages in `DOCLING_LOG`
|
||||
- Proper exit codes for different failure scenarios
|
||||
- Automatic cleanup on failure
|
||||
- Dataset records with processing status
|
||||
|
||||
## Local Development
|
||||
|
||||
If you wish to develop or modify this Actor locally:
|
||||
|
||||
1. Clone the repository.
|
||||
2. Ensure Docker is installed.
|
||||
3. The Actor files are located in the `.actor` directory:
|
||||
- `Dockerfile` - Defines the container environment
|
||||
- `actor.json` - Actor configuration and metadata
|
||||
- `actor.sh` - Main execution script that starts the docling-serve API and orchestrates document processing
|
||||
- `input_schema.json` - Input parameter definitions
|
||||
- `dataset_schema.json` - Dataset output format definition
|
||||
- `CHANGELOG.md` - Change log documenting all notable changes
|
||||
- `README.md` - This documentation
|
||||
4. Run the Actor locally using:
|
||||
|
||||
```bash
|
||||
apify run
|
||||
```
|
||||
|
||||
### Actor Structure
|
||||
|
||||
```text
|
||||
.actor/
|
||||
├── Dockerfile # Container definition
|
||||
├── actor.json # Actor metadata
|
||||
├── actor.sh # Execution script (also starts docling-serve API)
|
||||
├── input_schema.json # Input parameters
|
||||
├── dataset_schema.json # Dataset output format definition
|
||||
├── docling_processor.py # Python script for API communication
|
||||
├── CHANGELOG.md # Version history and changes
|
||||
└── README.md # This documentation
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
This Actor uses a lightweight architecture based on the official `quay.io/ds4sd/docling-serve-cpu` Docker image:
|
||||
|
||||
- **Base Image**: `quay.io/ds4sd/docling-serve-cpu:latest` (~4GB)
|
||||
- **Multi-Stage Build**: Uses a multi-stage Docker build to include only necessary tools
|
||||
- **API Communication**: Uses the RESTful API provided by docling-serve
|
||||
- **Request Flow**:
|
||||
1. The actor script starts the docling-serve API on port 5001
|
||||
2. Performs health checks to ensure the API is running
|
||||
3. Processes the input parameters
|
||||
4. Creates a JSON payload for the docling-serve API with proper format:
|
||||
```json
|
||||
{
|
||||
"options": {
|
||||
"to_formats": ["md"],
|
||||
"do_ocr": true
|
||||
},
|
||||
"http_sources": [{"url": "https://example.com/document.pdf"}]
|
||||
}
|
||||
```
|
||||
5. Makes a POST request to the `/v1alpha/convert/source` endpoint
|
||||
6. Processes the response and stores it in the key-value store
|
||||
- **Dependencies**:
|
||||
- Node.js for Apify CLI
|
||||
- Essential tools (curl, jq, etc.) copied from build stage
|
||||
- **Security**: Runs as a non-root user for enhanced security
|
||||
|
||||
## License
|
||||
|
||||
This wrapper project is under the MIT License, matching the original Docling license. See [LICENSE](../LICENSE) for details.
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- [Docling](https://ds4sd.github.io/docling/) and [docling-serve-cpu](https://quay.io/repository/ds4sd/docling-serve-cpu) by IBM
|
||||
- [Apify](https://apify.com/?fpr=docling) for the serverless actor environment
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Actor runs under a non-root user for enhanced security
|
||||
- Input URLs are validated before processing
|
||||
- Temporary files are securely managed and cleaned up
|
||||
- Process isolation through Docker containerization
|
||||
- Secure handling of processing artifacts
|
11
.actor/actor.json
Normal file
11
.actor/actor.json
Normal file
@ -0,0 +1,11 @@
|
||||
{
|
||||
"actorSpecification": 1,
|
||||
"name": "docling",
|
||||
"version": "0.0",
|
||||
"environmentVariables": {},
|
||||
"dockerFile": "./Dockerfile",
|
||||
"input": "./input_schema.json",
|
||||
"scripts": {
|
||||
"run": "./actor.sh"
|
||||
}
|
||||
}
|
419
.actor/actor.sh
Executable file
419
.actor/actor.sh
Executable file
@ -0,0 +1,419 @@
|
||||
#!/bin/bash
|
||||
|
||||
export PATH=$PATH:/build-files/node_modules/.bin
|
||||
|
||||
# Function to upload content to the key-value store
|
||||
upload_to_kvs() {
|
||||
local content_file="$1"
|
||||
local key_name="$2"
|
||||
local content_type="$3"
|
||||
local description="$4"
|
||||
|
||||
# Find the Apify CLI command
|
||||
find_apify_cmd
|
||||
local apify_cmd="$FOUND_APIFY_CMD"
|
||||
|
||||
if [ -n "$apify_cmd" ]; then
|
||||
echo "Uploading $description to key-value store (key: $key_name)..."
|
||||
|
||||
# Create a temporary home directory with write permissions
|
||||
setup_temp_environment
|
||||
|
||||
# Use the --no-update-notifier flag if available
|
||||
if $apify_cmd --help | grep -q "\--no-update-notifier"; then
|
||||
if $apify_cmd --no-update-notifier actor:set-value "$key_name" --contentType "$content_type" < "$content_file"; then
|
||||
echo "Successfully uploaded $description to key-value store"
|
||||
local url="https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/$key_name"
|
||||
echo "$description available at: $url"
|
||||
cleanup_temp_environment
|
||||
return 0
|
||||
fi
|
||||
else
|
||||
# Fall back to regular command if flag isn't available
|
||||
if $apify_cmd actor:set-value "$key_name" --contentType "$content_type" < "$content_file"; then
|
||||
echo "Successfully uploaded $description to key-value store"
|
||||
local url="https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/$key_name"
|
||||
echo "$description available at: $url"
|
||||
cleanup_temp_environment
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "ERROR: Failed to upload $description to key-value store"
|
||||
cleanup_temp_environment
|
||||
return 1
|
||||
else
|
||||
echo "ERROR: Apify CLI not found for $description upload"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Function to find Apify CLI command
|
||||
find_apify_cmd() {
|
||||
FOUND_APIFY_CMD=""
|
||||
for cmd in "apify" "actor" "/usr/local/bin/apify" "/usr/bin/apify" "/opt/apify/cli/bin/apify"; do
|
||||
if command -v "$cmd" &> /dev/null; then
|
||||
FOUND_APIFY_CMD="$cmd"
|
||||
break
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
# Function to set up temporary environment for Apify CLI
|
||||
setup_temp_environment() {
|
||||
export TMPDIR="/tmp/apify-home-${RANDOM}"
|
||||
mkdir -p "$TMPDIR"
|
||||
export APIFY_DISABLE_VERSION_CHECK=1
|
||||
export NODE_OPTIONS="--no-warnings"
|
||||
export HOME="$TMPDIR" # Override home directory to writable location
|
||||
}
|
||||
|
||||
# Function to clean up temporary environment
|
||||
cleanup_temp_environment() {
|
||||
rm -rf "$TMPDIR" 2>/dev/null || true
|
||||
}
|
||||
|
||||
# Function to push data to Apify dataset
|
||||
push_to_dataset() {
|
||||
# Example usage: push_to_dataset "$RESULT_URL" "$OUTPUT_SIZE" "zip"
|
||||
|
||||
local result_url="$1"
|
||||
local size="$2"
|
||||
local format="$3"
|
||||
|
||||
# Find Apify CLI command
|
||||
find_apify_cmd
|
||||
local apify_cmd="$FOUND_APIFY_CMD"
|
||||
|
||||
if [ -n "$apify_cmd" ]; then
|
||||
echo "Adding record to dataset..."
|
||||
setup_temp_environment
|
||||
|
||||
# Use the --no-update-notifier flag if available
|
||||
if $apify_cmd --help | grep -q "\--no-update-notifier"; then
|
||||
if $apify_cmd --no-update-notifier actor:push-data "{\"output_file\": \"${result_url}\", \"format\": \"${format}\", \"size\": \"${size}\", \"status\": \"success\"}"; then
|
||||
echo "Successfully added record to dataset"
|
||||
else
|
||||
echo "Warning: Failed to add record to dataset"
|
||||
fi
|
||||
else
|
||||
# Fall back to regular command
|
||||
if $apify_cmd actor:push-data "{\"output_file\": \"${result_url}\", \"format\": \"${format}\", \"size\": \"${size}\", \"status\": \"success\"}"; then
|
||||
echo "Successfully added record to dataset"
|
||||
else
|
||||
echo "Warning: Failed to add record to dataset"
|
||||
fi
|
||||
fi
|
||||
|
||||
cleanup_temp_environment
|
||||
fi
|
||||
}
|
||||
|
||||
|
||||
# --- Setup logging and error handling ---
|
||||
|
||||
LOG_FILE="/tmp/docling.log"
|
||||
touch "$LOG_FILE" || {
|
||||
echo "Fatal: Cannot create log file at $LOG_FILE"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Log to both console and file
|
||||
exec 1> >(tee -a "$LOG_FILE")
|
||||
exec 2> >(tee -a "$LOG_FILE" >&2)
|
||||
|
||||
# Exit codes
|
||||
readonly ERR_API_UNAVAILABLE=15
|
||||
readonly ERR_INVALID_INPUT=16
|
||||
|
||||
|
||||
# --- Debug environment ---
|
||||
|
||||
echo "Date: $(date)"
|
||||
echo "Python version: $(python --version 2>&1)"
|
||||
echo "Docling-serve path: $(which docling-serve 2>/dev/null || echo 'Not found')"
|
||||
echo "Working directory: $(pwd)"
|
||||
|
||||
# --- Get input ---
|
||||
|
||||
echo "Getting Apify Actor Input"
|
||||
INPUT=$(apify actor get-input 2>/dev/null)
|
||||
|
||||
# --- Setup tools ---
|
||||
|
||||
echo "Setting up tools..."
|
||||
TOOLS_DIR="/tmp/docling-tools"
|
||||
mkdir -p "$TOOLS_DIR"
|
||||
|
||||
# Copy tools if available
|
||||
if [ -d "/build-files" ]; then
|
||||
echo "Copying tools from /build-files..."
|
||||
cp -r /build-files/* "$TOOLS_DIR/"
|
||||
export PATH="$TOOLS_DIR/bin:$PATH"
|
||||
else
|
||||
echo "Warning: No build files directory found. Some tools may be unavailable."
|
||||
fi
|
||||
|
||||
# Copy Python processor script to tools directory
|
||||
PYTHON_SCRIPT_PATH="$(dirname "$0")/docling_processor.py"
|
||||
if [ -f "$PYTHON_SCRIPT_PATH" ]; then
|
||||
echo "Copying Python processor script to tools directory..."
|
||||
cp "$PYTHON_SCRIPT_PATH" "$TOOLS_DIR/"
|
||||
chmod +x "$TOOLS_DIR/docling_processor.py"
|
||||
else
|
||||
echo "ERROR: Python processor script not found at $PYTHON_SCRIPT_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check OCR directories and ensure they're writable
|
||||
echo "Checking OCR directory permissions..."
|
||||
OCR_DIR="/opt/app-root/src/.EasyOCR"
|
||||
if [ -d "$OCR_DIR" ]; then
|
||||
# Test if we can write to the directory
|
||||
if touch "$OCR_DIR/test_write" 2>/dev/null; then
|
||||
echo "[✓] OCR directory is writable"
|
||||
rm "$OCR_DIR/test_write"
|
||||
else
|
||||
echo "[✗] OCR directory is not writable, setting up alternative in /tmp"
|
||||
|
||||
# Create alternative in /tmp (which is writable)
|
||||
mkdir -p "/tmp/.EasyOCR/user_network"
|
||||
export EASYOCR_MODULE_PATH="/tmp/.EasyOCR"
|
||||
fi
|
||||
else
|
||||
echo "OCR directory not found, creating in /tmp"
|
||||
mkdir -p "/tmp/.EasyOCR/user_network"
|
||||
export EASYOCR_MODULE_PATH="/tmp/.EasyOCR"
|
||||
fi
|
||||
|
||||
|
||||
# --- Starting the API ---
|
||||
|
||||
echo "Starting docling-serve API..."
|
||||
|
||||
# Create a dedicated working directory in /tmp (writable)
|
||||
API_DIR="/tmp/docling-api"
|
||||
mkdir -p "$API_DIR"
|
||||
cd "$API_DIR"
|
||||
echo "API working directory: $(pwd)"
|
||||
|
||||
# Find docling-serve executable
|
||||
DOCLING_SERVE_PATH=$(which docling-serve)
|
||||
echo "Docling-serve executable: $DOCLING_SERVE_PATH"
|
||||
|
||||
# Start the API with minimal parameters to avoid any issues
|
||||
echo "Starting docling-serve API..."
|
||||
"$DOCLING_SERVE_PATH" run --host 0.0.0.0 --port 5001 > "$API_DIR/docling-serve.log" 2>&1 &
|
||||
API_PID=$!
|
||||
echo "Started docling-serve API with PID: $API_PID"
|
||||
|
||||
# A more reliable wait for API startup
|
||||
echo "Waiting for API to initialize..."
|
||||
MAX_TRIES=30
|
||||
tries=0
|
||||
started=false
|
||||
|
||||
while [ $tries -lt $MAX_TRIES ]; do
|
||||
tries=$((tries + 1))
|
||||
|
||||
# Check if process is still running
|
||||
if ! ps -p $API_PID > /dev/null; then
|
||||
echo "ERROR: docling-serve API process terminated unexpectedly after $tries seconds"
|
||||
break
|
||||
fi
|
||||
|
||||
# Check log for startup completion or errors
|
||||
if grep -q "Application startup complete" "$API_DIR/docling-serve.log" 2>/dev/null; then
|
||||
echo "[✓] API startup completed successfully after $tries seconds"
|
||||
started=true
|
||||
break
|
||||
fi
|
||||
|
||||
if grep -q "Permission denied\|PermissionError" "$API_DIR/docling-serve.log" 2>/dev/null; then
|
||||
echo "ERROR: Permission errors detected in API startup"
|
||||
break
|
||||
fi
|
||||
|
||||
# Sleep and check again
|
||||
sleep 1
|
||||
|
||||
# Output a progress indicator every 5 seconds
|
||||
if [ $((tries % 5)) -eq 0 ]; then
|
||||
echo "Still waiting for API startup... ($tries/$MAX_TRIES seconds)"
|
||||
fi
|
||||
done
|
||||
|
||||
# Show log content regardless of outcome
|
||||
echo "docling-serve log output so far:"
|
||||
tail -n 20 "$API_DIR/docling-serve.log"
|
||||
|
||||
# Verify the API is running
|
||||
if ! ps -p $API_PID > /dev/null; then
|
||||
echo "ERROR: docling-serve API failed to start"
|
||||
if [ -f "$API_DIR/docling-serve.log" ]; then
|
||||
echo "Full log output:"
|
||||
cat "$API_DIR/docling-serve.log"
|
||||
fi
|
||||
exit $ERR_API_UNAVAILABLE
|
||||
fi
|
||||
|
||||
if [ "$started" != "true" ]; then
|
||||
echo "WARNING: API process is running but startup completion was not detected"
|
||||
echo "Will attempt to continue anyway..."
|
||||
fi
|
||||
|
||||
# Try to verify API is responding at this point
|
||||
echo "Verifying API responsiveness..."
|
||||
(python -c "
|
||||
import sys, time, socket
|
||||
for i in range(5):
|
||||
try:
|
||||
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
s.settimeout(1)
|
||||
result = s.connect_ex(('localhost', 5001))
|
||||
if result == 0:
|
||||
s.close()
|
||||
print('Port 5001 is open and accepting connections')
|
||||
sys.exit(0)
|
||||
s.close()
|
||||
except Exception as e:
|
||||
pass
|
||||
time.sleep(1)
|
||||
print('Could not connect to API port after 5 attempts')
|
||||
sys.exit(1)
|
||||
" && echo "API verification succeeded") || echo "API verification failed, but continuing anyway"
|
||||
|
||||
# Define API endpoint
|
||||
DOCLING_API_ENDPOINT="http://localhost:5001/v1alpha/convert/source"
|
||||
|
||||
|
||||
# --- Processing document ---
|
||||
|
||||
echo "Starting document processing..."
|
||||
echo "Reading input from Apify..."
|
||||
|
||||
echo "Input content:" >&2
|
||||
echo "$INPUT" >&2 # Send the raw input to stderr for debugging
|
||||
echo "$INPUT" # Send the clean JSON to stdout for processing
|
||||
|
||||
# Create the request JSON
|
||||
|
||||
REQUEST_JSON=$(echo $INPUT | jq '.options += {"return_as_file": true}')
|
||||
|
||||
echo "Creating request JSON:" >&2
|
||||
echo "$REQUEST_JSON" >&2
|
||||
echo "$REQUEST_JSON" > "$API_DIR/request.json"
|
||||
|
||||
|
||||
# Send the conversion request using our Python script
|
||||
#echo "Sending conversion request to docling-serve API..."
|
||||
#python "$TOOLS_DIR/docling_processor.py" \
|
||||
# --api-endpoint "$DOCLING_API_ENDPOINT" \
|
||||
# --request-json "$API_DIR/request.json" \
|
||||
# --output-dir "$API_DIR" \
|
||||
# --output-format "$OUTPUT_FORMAT"
|
||||
|
||||
echo "Curl the Docling API"
|
||||
curl -s -H "content-type: application/json" -X POST --data-binary @$API_DIR/request.json -o $API_DIR/output.zip $DOCLING_API_ENDPOINT
|
||||
|
||||
CURL_EXIT_CODE=$?
|
||||
|
||||
# --- Check for various potential output files ---
|
||||
|
||||
echo "Checking for output files..."
|
||||
if [ -f "$API_DIR/output.zip" ]; then
|
||||
echo "Conversion completed successfully! Output file found."
|
||||
|
||||
# Get content from the converted file
|
||||
OUTPUT_SIZE=$(wc -c < "$API_DIR/output.zip")
|
||||
echo "Output file found with size: $OUTPUT_SIZE bytes"
|
||||
|
||||
# Calculate the access URL for result display
|
||||
RESULT_URL="https://api.apify.com/v2/key-value-stores/${APIFY_DEFAULT_KEY_VALUE_STORE_ID}/records/OUTPUT"
|
||||
|
||||
echo "=============================="
|
||||
echo "PROCESSING COMPLETE!"
|
||||
echo "Output size: ${OUTPUT_SIZE} bytes"
|
||||
echo "=============================="
|
||||
|
||||
# Set the output content type based on format
|
||||
CONTENT_TYPE="application/zip"
|
||||
|
||||
# Upload the document content using our function
|
||||
upload_to_kvs "$API_DIR/output.zip" "OUTPUT" "$CONTENT_TYPE" "Document content"
|
||||
|
||||
# Only proceed with dataset record if document upload succeeded
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "Your document is available at: ${RESULT_URL}"
|
||||
echo "=============================="
|
||||
|
||||
# Push data to dataset
|
||||
push_to_dataset "$RESULT_URL" "$OUTPUT_SIZE" "zip"
|
||||
fi
|
||||
else
|
||||
echo "ERROR: No converted output file found at $API_DIR/output.zip"
|
||||
|
||||
# Create error metadata
|
||||
ERROR_METADATA="{\"status\":\"error\",\"error\":\"No converted output file found\",\"documentUrl\":\"$DOCUMENT_URL\"}"
|
||||
echo "$ERROR_METADATA" > "/tmp/actor-output/OUTPUT"
|
||||
chmod 644 "/tmp/actor-output/OUTPUT"
|
||||
|
||||
echo "Error information has been saved to /tmp/actor-output/OUTPUT"
|
||||
fi
|
||||
|
||||
|
||||
# --- Verify output files for debugging ---
|
||||
|
||||
echo "=== Final Output Verification ==="
|
||||
echo "Files in /tmp/actor-output:"
|
||||
ls -la /tmp/actor-output/ 2>/dev/null || echo "Cannot list /tmp/actor-output/"
|
||||
|
||||
echo "All operations completed. The output should be available in the default key-value store."
|
||||
echo "Content URL: ${RESULT_URL:-No URL available}"
|
||||
|
||||
|
||||
# --- Cleanup function ---
|
||||
|
||||
cleanup() {
|
||||
echo "Running cleanup..."
|
||||
|
||||
# Stop the API process
|
||||
if [ -n "$API_PID" ]; then
|
||||
echo "Stopping docling-serve API (PID: $API_PID)..."
|
||||
kill $API_PID 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Export log file to KVS if it exists
|
||||
# DO THIS BEFORE REMOVING TOOLS DIRECTORY
|
||||
if [ -f "$LOG_FILE" ]; then
|
||||
if [ -s "$LOG_FILE" ]; then
|
||||
echo "Log file is not empty, pushing to key-value store (key: LOG)..."
|
||||
|
||||
# Upload log using our function
|
||||
upload_to_kvs "$LOG_FILE" "LOG" "text/plain" "Log file"
|
||||
else
|
||||
echo "Warning: log file exists but is empty"
|
||||
fi
|
||||
else
|
||||
echo "Warning: No log file found"
|
||||
fi
|
||||
|
||||
# Clean up temporary files AFTER log is uploaded
|
||||
echo "Cleaning up temporary files..."
|
||||
if [ -d "$API_DIR" ]; then
|
||||
echo "Removing API working directory: $API_DIR"
|
||||
rm -rf "$API_DIR" 2>/dev/null || echo "Warning: Failed to remove $API_DIR"
|
||||
fi
|
||||
|
||||
if [ -d "$TOOLS_DIR" ]; then
|
||||
echo "Removing tools directory: $TOOLS_DIR"
|
||||
rm -rf "$TOOLS_DIR" 2>/dev/null || echo "Warning: Failed to remove $TOOLS_DIR"
|
||||
fi
|
||||
|
||||
# Keep log file until the very end
|
||||
echo "Script execution completed at $(date)"
|
||||
echo "Actor execution completed"
|
||||
}
|
||||
|
||||
# Register cleanup
|
||||
trap cleanup EXIT
|
31
.actor/dataset_schema.json
Normal file
31
.actor/dataset_schema.json
Normal file
@ -0,0 +1,31 @@
|
||||
{
|
||||
"title": "Docling Actor Dataset",
|
||||
"description": "Records of document processing results from the Docling Actor",
|
||||
"type": "object",
|
||||
"schemaVersion": 1,
|
||||
"properties": {
|
||||
"url": {
|
||||
"title": "Document URL",
|
||||
"type": "string",
|
||||
"description": "URL of the processed document"
|
||||
},
|
||||
"output_file": {
|
||||
"title": "Result URL",
|
||||
"type": "string",
|
||||
"description": "Direct URL to the processed result in key-value store"
|
||||
},
|
||||
"status": {
|
||||
"title": "Processing Status",
|
||||
"type": "string",
|
||||
"description": "Status of the document processing",
|
||||
"enum": ["success", "error"]
|
||||
},
|
||||
"error": {
|
||||
"title": "Error Details",
|
||||
"type": "string",
|
||||
"description": "Error message if processing failed",
|
||||
"optional": true
|
||||
}
|
||||
},
|
||||
"required": ["url", "output_file", "status"]
|
||||
}
|
27
.actor/input_schema.json
Normal file
27
.actor/input_schema.json
Normal file
@ -0,0 +1,27 @@
|
||||
{
|
||||
"title": "Docling Actor Input",
|
||||
"description": "Options for processing documents with Docling via the docling-serve API.",
|
||||
"type": "object",
|
||||
"schemaVersion": 1,
|
||||
"properties": {
|
||||
"http_sources": {
|
||||
"title": "Document URLs",
|
||||
"type": "array",
|
||||
"description": "URLs of documents to process. Supported formats: PDF, DOCX, PPTX, XLSX, HTML, MD, XML, images, and more.",
|
||||
"editor": "json",
|
||||
"prefill": [
|
||||
{ "url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf" }
|
||||
]
|
||||
},
|
||||
"options": {
|
||||
"title": "Processing Options",
|
||||
"type": "object",
|
||||
"description": "Document processing configuration options",
|
||||
"editor": "json",
|
||||
"prefill": {
|
||||
"to_formats": ["md"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"required": ["options", "http_sources"]
|
||||
}
|
19
CHANGELOG.md
19
CHANGELOG.md
@ -1,3 +1,22 @@
|
||||
## [v2.27.0](https://github.com/docling-project/docling/releases/tag/v2.27.0) - 2025-03-18
|
||||
|
||||
### Feature
|
||||
|
||||
* Add factory for ocr engines via plugins ([#1010](https://github.com/docling-project/docling/issues/1010)) ([`6eaae3c`](https://github.com/docling-project/docling/commit/6eaae3cba034599020dc06ebdad3bc3ff0b5a8eb))
|
||||
* Add DoclingParseV4 backend, using high-level docling-parse API ([#905](https://github.com/docling-project/docling/issues/905)) ([`3960b19`](https://github.com/docling-project/docling/commit/3960b199d63d0e9d660aeb0cbced02b38bb0b593))
|
||||
* **actor:** Docling Actor on Apify infrastructure ([#875](https://github.com/docling-project/docling/issues/875)) ([`772487f`](https://github.com/docling-project/docling/commit/772487f9c91ad2ee53c591c314c72443f9cbfd23))
|
||||
* Equations to latex in MSWord backend (with inline groups) ([#1114](https://github.com/docling-project/docling/issues/1114)) ([`6eb718f`](https://github.com/docling-project/docling/commit/6eb718f8493038d1b4b6ae836df5a24aa13cd17e))
|
||||
|
||||
### Fix
|
||||
|
||||
* **html:** Handle nested empty lists ([#1154](https://github.com/docling-project/docling/issues/1154)) ([`f94da44`](https://github.com/docling-project/docling/commit/f94da44ec5c7a8c92b9dd60e4df5dc945ed6d1ea))
|
||||
* Use first table row as col headers ([#1156](https://github.com/docling-project/docling/issues/1156)) ([`0945973`](https://github.com/docling-project/docling/commit/0945973b79d67b74281aba5102ee985ac1de74ea))
|
||||
* Pass tests, update docling-core to 2.22.0 ([#1150](https://github.com/docling-project/docling/issues/1150)) ([`aa92a57`](https://github.com/docling-project/docling/commit/aa92a57fa9e7228e894efb9050a0cdb9f287ebfd))
|
||||
|
||||
### Documentation
|
||||
|
||||
* Fix spelling of picture in usage ([#1165](https://github.com/docling-project/docling/issues/1165)) ([`7e01798`](https://github.com/docling-project/docling/commit/7e01798417c424c05685e0ff5f6f89f70dc3bfcd))
|
||||
|
||||
## [v2.26.0](https://github.com/docling-project/docling/releases/tag/v2.26.0) - 2025-03-11
|
||||
|
||||
### Feature
|
||||
|
@ -1,129 +1,3 @@
|
||||
# Contributor Covenant Code of Conduct
|
||||
|
||||
## Our Pledge
|
||||
|
||||
We as members, contributors, and leaders pledge to make participation in our
|
||||
community a harassment-free experience for everyone, regardless of age, body
|
||||
size, visible or invisible disability, ethnicity, sex characteristics, gender
|
||||
identity and expression, level of experience, education, socio-economic status,
|
||||
nationality, personal appearance, race, religion, or sexual identity
|
||||
and orientation.
|
||||
|
||||
We pledge to act and interact in ways that contribute to an open, welcoming,
|
||||
diverse, inclusive, and healthy community.
|
||||
|
||||
## Our Standards
|
||||
|
||||
Examples of behavior that contributes to a positive environment for our
|
||||
community include:
|
||||
|
||||
* Demonstrating empathy and kindness toward other people
|
||||
* Being respectful of differing opinions, viewpoints, and experiences
|
||||
* Giving and gracefully accepting constructive feedback
|
||||
* Accepting responsibility and apologizing to those affected by our mistakes,
|
||||
and learning from the experience
|
||||
* Focusing on what is best not just for us as individuals, but for the
|
||||
overall community
|
||||
|
||||
Examples of unacceptable behavior include:
|
||||
|
||||
* The use of sexualized language or imagery, and sexual attention or
|
||||
advances of any kind
|
||||
* Trolling, insulting or derogatory comments, and personal or political attacks
|
||||
* Public or private harassment
|
||||
* Publishing others' private information, such as a physical or email
|
||||
address, without their explicit permission
|
||||
* Other conduct which could reasonably be considered inappropriate in a
|
||||
professional setting
|
||||
|
||||
## Enforcement Responsibilities
|
||||
|
||||
Community leaders are responsible for clarifying and enforcing our standards of
|
||||
acceptable behavior and will take appropriate and fair corrective action in
|
||||
response to any behavior that they deem inappropriate, threatening, offensive,
|
||||
or harmful.
|
||||
|
||||
Community leaders have the right and responsibility to remove, edit, or reject
|
||||
comments, commits, code, wiki edits, issues, and other contributions that are
|
||||
not aligned to this Code of Conduct, and will communicate reasons for moderation
|
||||
decisions when appropriate.
|
||||
|
||||
## Scope
|
||||
|
||||
This Code of Conduct applies within all community spaces, and also applies when
|
||||
an individual is officially representing the community in public spaces.
|
||||
Examples of representing our community include using an official e-mail address,
|
||||
posting via an official social media account, or acting as an appointed
|
||||
representative at an online or offline event.
|
||||
|
||||
## Enforcement
|
||||
|
||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
||||
reported to the community leaders responsible for enforcement using
|
||||
[deepsearch-core@zurich.ibm.com](mailto:deepsearch-core@zurich.ibm.com).
|
||||
|
||||
All complaints will be reviewed and investigated promptly and fairly.
|
||||
|
||||
All community leaders are obligated to respect the privacy and security of the
|
||||
reporter of any incident.
|
||||
|
||||
## Enforcement Guidelines
|
||||
|
||||
Community leaders will follow these Community Impact Guidelines in determining
|
||||
the consequences for any action they deem in violation of this Code of Conduct:
|
||||
|
||||
### 1. Correction
|
||||
|
||||
**Community Impact**: Use of inappropriate language or other behavior deemed
|
||||
unprofessional or unwelcome in the community.
|
||||
|
||||
**Consequence**: A private, written warning from community leaders, providing
|
||||
clarity around the nature of the violation and an explanation of why the
|
||||
behavior was inappropriate. A public apology may be requested.
|
||||
|
||||
### 2. Warning
|
||||
|
||||
**Community Impact**: A violation through a single incident or series
|
||||
of actions.
|
||||
|
||||
**Consequence**: A warning with consequences for continued behavior. No
|
||||
interaction with the people involved, including unsolicited interaction with
|
||||
those enforcing the Code of Conduct, for a specified period of time. This
|
||||
includes avoiding interactions in community spaces as well as external channels
|
||||
like social media. Violating these terms may lead to a temporary or
|
||||
permanent ban.
|
||||
|
||||
### 3. Temporary Ban
|
||||
|
||||
**Community Impact**: A serious violation of community standards, including
|
||||
sustained inappropriate behavior.
|
||||
|
||||
**Consequence**: A temporary ban from any sort of interaction or public
|
||||
communication with the community for a specified period of time. No public or
|
||||
private interaction with the people involved, including unsolicited interaction
|
||||
with those enforcing the Code of Conduct, is allowed during this period.
|
||||
Violating these terms may lead to a permanent ban.
|
||||
|
||||
### 4. Permanent Ban
|
||||
|
||||
**Community Impact**: Demonstrating a pattern of violation of community
|
||||
standards, including sustained inappropriate behavior, harassment of an
|
||||
individual, or aggression toward or disparagement of classes of individuals.
|
||||
|
||||
**Consequence**: A permanent ban from any sort of public interaction within
|
||||
the community.
|
||||
|
||||
## Attribution
|
||||
|
||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
||||
version 2.0, available at
|
||||
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html](https://www.contributor-covenant.org/version/2/0/code_of_conduct.html).
|
||||
|
||||
Community Impact Guidelines were inspired by [Mozilla's code of conduct
|
||||
enforcement ladder](https://github.com/mozilla/diversity).
|
||||
|
||||
Homepage: [https://www.contributor-covenant.org](https://www.contributor-covenant.org)
|
||||
|
||||
For answers to common questions about this code of conduct, see the FAQ at
|
||||
[https://www.contributor-covenant.org/faq](https://www.contributor-covenant.org/faq). Translations are available at
|
||||
[https://www.contributor-covenant.org/translations](https://www.contributor-covenant.org/translations).
|
||||
This project adheres to the [Docling - Code of Conduct and Covenant](https://github.com/docling-project/community/blob/main/CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.
|
||||
|
@ -2,85 +2,7 @@
|
||||
Our project welcomes external contributions. If you have an itch, please feel
|
||||
free to scratch it.
|
||||
|
||||
To contribute code or documentation, please submit a [pull request](https://github.com/docling-project/docling/pulls).
|
||||
|
||||
A good way to familiarize yourself with the codebase and contribution process is
|
||||
to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/docling-project/docling/issues).
|
||||
Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.
|
||||
|
||||
For general questions or support requests, please refer to the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||
|
||||
**Note: We appreciate your effort and want to avoid situations where a contribution
|
||||
requires extensive rework (by you or by us), sits in the backlog for a long time, or
|
||||
cannot be accepted at all!**
|
||||
|
||||
### Proposing New Features
|
||||
|
||||
If you would like to implement a new feature, please [raise an issue](https://github.com/docling-project/docling/issues)
|
||||
before sending a pull request so the feature can be discussed. This is to avoid
|
||||
you spending valuable time working on a feature that the project developers
|
||||
are not interested in accepting into the codebase.
|
||||
|
||||
### Fixing Bugs
|
||||
|
||||
If you would like to fix a bug, please [raise an issue](https://github.com/docling-project/docling/issues) before sending a
|
||||
pull request so it can be tracked.
|
||||
|
||||
### Merge Approval
|
||||
|
||||
The project maintainers use LGTM (Looks Good To Me) in comments on the code
|
||||
review to indicate acceptance. A change requires LGTMs from two of the
|
||||
maintainers of each component affected.
|
||||
|
||||
For a list of the maintainers, see the [MAINTAINERS.md](MAINTAINERS.md) page.
|
||||
|
||||
|
||||
## Legal
|
||||
|
||||
Each source file must include a license header for the MIT
|
||||
Software. Using the SPDX format is the simplest approach,
|
||||
e.g.
|
||||
|
||||
```
|
||||
/*
|
||||
Copyright IBM Inc. All rights reserved.
|
||||
|
||||
SPDX-License-Identifier: MIT
|
||||
*/
|
||||
```
|
||||
|
||||
We have tried to make it as easy as possible to make contributions. This
|
||||
applies to how we handle the legal aspects of contribution. We use the
|
||||
same approach - the [Developer's Certificate of Origin 1.1 (DCO)](https://github.com/hyperledger/fabric/blob/master/docs/source/DCO1.1.txt) - that the Linux® Kernel [community](https://elinux.org/Developer_Certificate_Of_Origin)
|
||||
uses to manage code contributions.
|
||||
|
||||
We simply ask that when submitting a patch for review, the developer
|
||||
must include a sign-off statement in the commit message.
|
||||
|
||||
Here is an example Signed-off-by line, which indicates that the
|
||||
submitter accepts the DCO:
|
||||
|
||||
```
|
||||
Signed-off-by: John Doe <john.doe@example.com>
|
||||
```
|
||||
|
||||
You can include this automatically when you commit a change to your
|
||||
local git repository using the following command:
|
||||
|
||||
```
|
||||
git commit -s
|
||||
```
|
||||
|
||||
### New dependencies
|
||||
|
||||
This project strictly adheres to using dependencies that are compatible with the MIT license to ensure maximum flexibility and permissiveness in its usage and distribution. As a result, dependencies licensed under restrictive terms such as GPL, LGPL, AGPL, or similar are explicitly excluded. These licenses impose additional requirements and limitations that are incompatible with the MIT license's minimal restrictions, potentially affecting derivative works and redistribution. By maintaining this policy, the project ensures simplicity and freedom for both developers and users, avoiding conflicts with stricter copyleft provisions.
|
||||
|
||||
|
||||
## Communication
|
||||
|
||||
Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
|
||||
|
||||
|
||||
For more details on the contributing guidelines head to the Docling Project [community repository](https://github.com/docling-project/community).
|
||||
|
||||
## Developing
|
||||
|
||||
|
@ -2,9 +2,6 @@
|
||||
|
||||
- Christoph Auer - [@cau-git](https://github.com/cau-git)
|
||||
- Michele Dolfi - [@dolfim-ibm](https://github.com/dolfim-ibm)
|
||||
- Maxim Lysak - [@maxmnemonic](https://github.com/maxmnemonic)
|
||||
- Nikos Livathinos - [@nikos-livathinos](https://github.com/nikos-livathinos)
|
||||
- Ahmed Nassar - [@nassarofficial](https://github.com/nassarofficial)
|
||||
- Panos Vagenas - [@vagenas](https://github.com/vagenas)
|
||||
- Peter Staar - [@PeterStaar-IBM](https://github.com/PeterStaar-IBM)
|
||||
|
||||
|
12
README.md
12
README.md
@ -21,6 +21,8 @@
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://pepy.tech/projects/docling)
|
||||
[](https://apify.com/vancura/docling)
|
||||
[](https://lfaidata.foundation/projects/)
|
||||
|
||||
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
||||
|
||||
@ -33,12 +35,12 @@ Docling simplifies document processing, parsing diverse formats — including ad
|
||||
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
|
||||
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
|
||||
* 🔍 Extensive OCR support for scanned PDFs and images
|
||||
* 🥚 Support of Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
|
||||
* 💻 Simple and convenient CLI
|
||||
|
||||
### Coming soon
|
||||
|
||||
* 📝 Metadata extraction, including title, authors, references & language
|
||||
* 📝 Inclusion of Visual Language Models ([SmolDocling](https://huggingface.co/blog/smolervlm#smoldocling))
|
||||
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
|
||||
* 📝 Complex chemistry understanding (Molecular structures)
|
||||
|
||||
@ -119,9 +121,13 @@ If you use Docling in your projects, please consider citing the following:
|
||||
The Docling codebase is under MIT license.
|
||||
For individual model usage, please refer to the model licenses found in the original packages.
|
||||
|
||||
## IBM ❤️ Open Source AI
|
||||
## LF AI & Data
|
||||
|
||||
Docling has been brought to you by IBM.
|
||||
Docling is hosted as a project in the [LF AI & Data Foundation](https://lfaidata.foundation/projects/).
|
||||
|
||||
### IBM ❤️ Open Source AI
|
||||
|
||||
The project was started by the AI for knowledge team at IBM Research Zurich.
|
||||
|
||||
[supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
|
||||
[docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
|
||||
|
@ -6,12 +6,12 @@ from typing import Iterable, List, Optional, Union
|
||||
|
||||
import pypdfium2 as pdfium
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin, Size
|
||||
from docling_core.types.doc.page import BoundingRectangle, SegmentedPdfPage, TextCell
|
||||
from docling_parse.pdf_parsers import pdf_parser_v1
|
||||
from PIL import Image, ImageDraw
|
||||
from pypdfium2 import PdfPage
|
||||
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
|
||||
from docling.datamodel.base_models import Cell
|
||||
from docling.datamodel.document import InputDocument
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
@ -68,8 +68,11 @@ class DoclingParsePageBackend(PdfPageBackend):
|
||||
|
||||
return text_piece
|
||||
|
||||
def get_text_cells(self) -> Iterable[Cell]:
|
||||
cells: List[Cell] = []
|
||||
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
|
||||
return None
|
||||
|
||||
def get_text_cells(self) -> Iterable[TextCell]:
|
||||
cells: List[TextCell] = []
|
||||
cell_counter = 0
|
||||
|
||||
if not self.valid:
|
||||
@ -91,19 +94,24 @@ class DoclingParsePageBackend(PdfPageBackend):
|
||||
|
||||
text_piece = self._dpage["cells"][i]["content"]["rnormalized"]
|
||||
cells.append(
|
||||
Cell(
|
||||
id=cell_counter,
|
||||
TextCell(
|
||||
index=cell_counter,
|
||||
text=text_piece,
|
||||
bbox=BoundingBox(
|
||||
# l=x0, b=y0, r=x1, t=y1,
|
||||
l=x0 * page_size.width / parser_width,
|
||||
b=y0 * page_size.height / parser_height,
|
||||
r=x1 * page_size.width / parser_width,
|
||||
t=y1 * page_size.height / parser_height,
|
||||
coord_origin=CoordOrigin.BOTTOMLEFT,
|
||||
orig=text_piece,
|
||||
from_ocr=False,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox(
|
||||
# l=x0, b=y0, r=x1, t=y1,
|
||||
l=x0 * page_size.width / parser_width,
|
||||
b=y0 * page_size.height / parser_height,
|
||||
r=x1 * page_size.width / parser_width,
|
||||
t=y1 * page_size.height / parser_height,
|
||||
coord_origin=CoordOrigin.BOTTOMLEFT,
|
||||
)
|
||||
).to_top_left_origin(page_size.height),
|
||||
)
|
||||
)
|
||||
|
||||
cell_counter += 1
|
||||
|
||||
def draw_clusters_and_cells():
|
||||
@ -112,7 +120,7 @@ class DoclingParsePageBackend(PdfPageBackend):
|
||||
) # make new image to avoid drawing on the saved ones
|
||||
draw = ImageDraw.Draw(image)
|
||||
for c in cells:
|
||||
x0, y0, x1, y1 = c.bbox.as_tuple()
|
||||
x0, y0, x1, y1 = c.rect.to_bounding_box().as_tuple()
|
||||
cell_color = (
|
||||
random.randint(30, 140),
|
||||
random.randint(30, 140),
|
||||
|
@ -6,12 +6,13 @@ from typing import TYPE_CHECKING, Iterable, List, Optional, Union
|
||||
|
||||
import pypdfium2 as pdfium
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, SegmentedPdfPage, TextCell
|
||||
from docling_parse.pdf_parsers import pdf_parser_v2
|
||||
from PIL import Image, ImageDraw
|
||||
from pypdfium2 import PdfPage
|
||||
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
|
||||
from docling.datamodel.base_models import Cell, Size
|
||||
from docling.datamodel.base_models import Size
|
||||
from docling.utils.locks import pypdfium2_lock
|
||||
|
||||
if TYPE_CHECKING:
|
||||
@ -78,8 +79,11 @@ class DoclingParseV2PageBackend(PdfPageBackend):
|
||||
|
||||
return text_piece
|
||||
|
||||
def get_text_cells(self) -> Iterable[Cell]:
|
||||
cells: List[Cell] = []
|
||||
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
|
||||
return None
|
||||
|
||||
def get_text_cells(self) -> Iterable[TextCell]:
|
||||
cells: List[TextCell] = []
|
||||
cell_counter = 0
|
||||
|
||||
if not self.valid:
|
||||
@ -106,16 +110,20 @@ class DoclingParseV2PageBackend(PdfPageBackend):
|
||||
|
||||
text_piece = cell_data[cells_header.index("text")]
|
||||
cells.append(
|
||||
Cell(
|
||||
id=cell_counter,
|
||||
TextCell(
|
||||
index=cell_counter,
|
||||
text=text_piece,
|
||||
bbox=BoundingBox(
|
||||
# l=x0, b=y0, r=x1, t=y1,
|
||||
l=x0 * page_size.width / parser_width,
|
||||
b=y0 * page_size.height / parser_height,
|
||||
r=x1 * page_size.width / parser_width,
|
||||
t=y1 * page_size.height / parser_height,
|
||||
coord_origin=CoordOrigin.BOTTOMLEFT,
|
||||
orig=text_piece,
|
||||
from_ocr=False,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox(
|
||||
# l=x0, b=y0, r=x1, t=y1,
|
||||
l=x0 * page_size.width / parser_width,
|
||||
b=y0 * page_size.height / parser_height,
|
||||
r=x1 * page_size.width / parser_width,
|
||||
t=y1 * page_size.height / parser_height,
|
||||
coord_origin=CoordOrigin.BOTTOMLEFT,
|
||||
)
|
||||
).to_top_left_origin(page_size.height),
|
||||
)
|
||||
)
|
||||
|
192
docling/backend/docling_parse_v4_backend.py
Normal file
192
docling/backend/docling_parse_v4_backend.py
Normal file
@ -0,0 +1,192 @@
|
||||
import logging
|
||||
import random
|
||||
from io import BytesIO
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Iterable, List, Optional, Union
|
||||
|
||||
import pypdfium2 as pdfium
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import SegmentedPdfPage, TextCell
|
||||
from docling_parse.pdf_parser import DoclingPdfParser, PdfDocument
|
||||
from PIL import Image, ImageDraw
|
||||
from pypdfium2 import PdfPage
|
||||
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
|
||||
from docling.datamodel.base_models import Size
|
||||
from docling.utils.locks import pypdfium2_lock
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from docling.datamodel.document import InputDocument
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DoclingParseV4PageBackend(PdfPageBackend):
|
||||
def __init__(self, parsed_page: SegmentedPdfPage, page_obj: PdfPage):
|
||||
self._ppage = page_obj
|
||||
self._dpage = parsed_page
|
||||
self.valid = parsed_page is not None
|
||||
|
||||
def is_valid(self) -> bool:
|
||||
return self.valid
|
||||
|
||||
def get_text_in_rect(self, bbox: BoundingBox) -> str:
|
||||
# Find intersecting cells on the page
|
||||
text_piece = ""
|
||||
page_size = self.get_size()
|
||||
|
||||
scale = (
|
||||
1 # FIX - Replace with param in get_text_in_rect across backends (optional)
|
||||
)
|
||||
|
||||
for i, cell in enumerate(self._dpage.textline_cells):
|
||||
cell_bbox = (
|
||||
cell.rect.to_bounding_box()
|
||||
.to_top_left_origin(page_height=page_size.height)
|
||||
.scaled(scale)
|
||||
)
|
||||
|
||||
overlap_frac = cell_bbox.intersection_area_with(bbox) / cell_bbox.area()
|
||||
|
||||
if overlap_frac > 0.5:
|
||||
if len(text_piece) > 0:
|
||||
text_piece += " "
|
||||
text_piece += cell.text
|
||||
|
||||
return text_piece
|
||||
|
||||
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
|
||||
return self._dpage
|
||||
|
||||
def get_text_cells(self) -> Iterable[TextCell]:
|
||||
page_size = self.get_size()
|
||||
|
||||
[tc.to_top_left_origin(page_size.height) for tc in self._dpage.textline_cells]
|
||||
|
||||
# for cell in self._dpage.textline_cells:
|
||||
# rect = cell.rect
|
||||
#
|
||||
# assert (
|
||||
# rect.to_bounding_box().l <= rect.to_bounding_box().r
|
||||
# ), f"left is > right on bounding box {rect.to_bounding_box()} of rect {rect}"
|
||||
# assert (
|
||||
# rect.to_bounding_box().t <= rect.to_bounding_box().b
|
||||
# ), f"top is > bottom on bounding box {rect.to_bounding_box()} of rect {rect}"
|
||||
|
||||
return self._dpage.textline_cells
|
||||
|
||||
def get_bitmap_rects(self, scale: float = 1) -> Iterable[BoundingBox]:
|
||||
AREA_THRESHOLD = 0 # 32 * 32
|
||||
|
||||
images = self._dpage.bitmap_resources
|
||||
|
||||
for img in images:
|
||||
cropbox = img.rect.to_bounding_box().to_top_left_origin(
|
||||
self.get_size().height
|
||||
)
|
||||
|
||||
if cropbox.area() > AREA_THRESHOLD:
|
||||
cropbox = cropbox.scaled(scale=scale)
|
||||
|
||||
yield cropbox
|
||||
|
||||
def get_page_image(
|
||||
self, scale: float = 1, cropbox: Optional[BoundingBox] = None
|
||||
) -> Image.Image:
|
||||
|
||||
page_size = self.get_size()
|
||||
|
||||
if not cropbox:
|
||||
cropbox = BoundingBox(
|
||||
l=0,
|
||||
r=page_size.width,
|
||||
t=0,
|
||||
b=page_size.height,
|
||||
coord_origin=CoordOrigin.TOPLEFT,
|
||||
)
|
||||
padbox = BoundingBox(
|
||||
l=0, r=0, t=0, b=0, coord_origin=CoordOrigin.BOTTOMLEFT
|
||||
)
|
||||
else:
|
||||
padbox = cropbox.to_bottom_left_origin(page_size.height).model_copy()
|
||||
padbox.r = page_size.width - padbox.r
|
||||
padbox.t = page_size.height - padbox.t
|
||||
|
||||
with pypdfium2_lock:
|
||||
image = (
|
||||
self._ppage.render(
|
||||
scale=scale * 1.5,
|
||||
rotation=0, # no additional rotation
|
||||
crop=padbox.as_tuple(),
|
||||
)
|
||||
.to_pil()
|
||||
.resize(
|
||||
size=(round(cropbox.width * scale), round(cropbox.height * scale))
|
||||
)
|
||||
) # We resize the image from 1.5x the given scale to make it sharper.
|
||||
|
||||
return image
|
||||
|
||||
def get_size(self) -> Size:
|
||||
with pypdfium2_lock:
|
||||
return Size(width=self._ppage.get_width(), height=self._ppage.get_height())
|
||||
|
||||
# TODO: Take width and height from docling-parse.
|
||||
# return Size(
|
||||
# width=self._dpage.dimension.width,
|
||||
# height=self._dpage.dimension.height,
|
||||
# )
|
||||
|
||||
def unload(self):
|
||||
self._ppage = None
|
||||
self._dpage = None
|
||||
|
||||
|
||||
class DoclingParseV4DocumentBackend(PdfDocumentBackend):
|
||||
def __init__(self, in_doc: "InputDocument", path_or_stream: Union[BytesIO, Path]):
|
||||
super().__init__(in_doc, path_or_stream)
|
||||
|
||||
with pypdfium2_lock:
|
||||
self._pdoc = pdfium.PdfDocument(self.path_or_stream)
|
||||
self.parser = DoclingPdfParser(loglevel="fatal")
|
||||
self.dp_doc: PdfDocument = self.parser.load(path_or_stream=self.path_or_stream)
|
||||
success = self.dp_doc is not None
|
||||
|
||||
if not success:
|
||||
raise RuntimeError(
|
||||
f"docling-parse v4 could not load document {self.document_hash}."
|
||||
)
|
||||
|
||||
def page_count(self) -> int:
|
||||
# return len(self._pdoc) # To be replaced with docling-parse API
|
||||
|
||||
len_1 = len(self._pdoc)
|
||||
len_2 = self.dp_doc.number_of_pages()
|
||||
|
||||
if len_1 != len_2:
|
||||
_log.error(f"Inconsistent number of pages: {len_1}!={len_2}")
|
||||
|
||||
return len_2
|
||||
|
||||
def load_page(
|
||||
self, page_no: int, create_words: bool = True, create_textlines: bool = True
|
||||
) -> DoclingParseV4PageBackend:
|
||||
with pypdfium2_lock:
|
||||
return DoclingParseV4PageBackend(
|
||||
self.dp_doc.get_page(
|
||||
page_no + 1,
|
||||
create_words=create_words,
|
||||
create_textlines=create_textlines,
|
||||
),
|
||||
self._pdoc[page_no],
|
||||
)
|
||||
|
||||
def is_valid(self) -> bool:
|
||||
return self.page_count() > 0
|
||||
|
||||
def unload(self):
|
||||
super().unload()
|
||||
self.dp_doc.unload()
|
||||
with pypdfium2_lock:
|
||||
self._pdoc.close()
|
||||
self._pdoc = None
|
@ -275,8 +275,10 @@ class MsWordDocumentBackend(DeclarativeDocumentBackend):
|
||||
only_equations.append(latex_equation)
|
||||
texts_and_equations.append(latex_equation)
|
||||
|
||||
if "".join(only_texts) != text:
|
||||
return text
|
||||
if "".join(only_texts).strip() != text.strip():
|
||||
# If we are not able to reconstruct the initial raw text
|
||||
# do not try to parse equations and return the original
|
||||
return text, []
|
||||
|
||||
return "".join(texts_and_equations), only_equations
|
||||
|
||||
@ -365,6 +367,7 @@ class MsWordDocumentBackend(DeclarativeDocumentBackend):
|
||||
for eq in equations:
|
||||
if len(text_tmp) == 0:
|
||||
break
|
||||
|
||||
pre_eq_text = text_tmp.split(eq, maxsplit=1)[0]
|
||||
text_tmp = text_tmp.split(eq, maxsplit=1)[1]
|
||||
if len(pre_eq_text) > 0:
|
||||
|
@ -4,10 +4,11 @@ from pathlib import Path
|
||||
from typing import Iterable, Optional, Set, Union
|
||||
|
||||
from docling_core.types.doc import BoundingBox, Size
|
||||
from docling_core.types.doc.page import SegmentedPdfPage, TextCell
|
||||
from PIL import Image
|
||||
|
||||
from docling.backend.abstract_backend import PaginatedDocumentBackend
|
||||
from docling.datamodel.base_models import Cell, InputFormat
|
||||
from docling.datamodel.base_models import InputFormat
|
||||
from docling.datamodel.document import InputDocument
|
||||
|
||||
|
||||
@ -17,7 +18,11 @@ class PdfPageBackend(ABC):
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_text_cells(self) -> Iterable[Cell]:
|
||||
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_text_cells(self) -> Iterable[TextCell]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
|
@ -7,12 +7,12 @@ from typing import TYPE_CHECKING, Iterable, List, Optional, Union
|
||||
import pypdfium2 as pdfium
|
||||
import pypdfium2.raw as pdfium_c
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin, Size
|
||||
from docling_core.types.doc.page import BoundingRectangle, SegmentedPdfPage, TextCell
|
||||
from PIL import Image, ImageDraw
|
||||
from pypdfium2 import PdfTextPage
|
||||
from pypdfium2._helpers.misc import PdfiumError
|
||||
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend, PdfPageBackend
|
||||
from docling.datamodel.base_models import Cell
|
||||
from docling.utils.locks import pypdfium2_lock
|
||||
|
||||
if TYPE_CHECKING:
|
||||
@ -68,7 +68,10 @@ class PyPdfiumPageBackend(PdfPageBackend):
|
||||
|
||||
return text_piece
|
||||
|
||||
def get_text_cells(self) -> Iterable[Cell]:
|
||||
def get_segmented_page(self) -> Optional[SegmentedPdfPage]:
|
||||
return None
|
||||
|
||||
def get_text_cells(self) -> Iterable[TextCell]:
|
||||
with pypdfium2_lock:
|
||||
if not self.text_page:
|
||||
self.text_page = self._ppage.get_textpage()
|
||||
@ -84,11 +87,19 @@ class PyPdfiumPageBackend(PdfPageBackend):
|
||||
text_piece = self.text_page.get_text_bounded(*rect)
|
||||
x0, y0, x1, y1 = rect
|
||||
cells.append(
|
||||
Cell(
|
||||
id=cell_counter,
|
||||
TextCell(
|
||||
index=cell_counter,
|
||||
text=text_piece,
|
||||
bbox=BoundingBox(
|
||||
l=x0, b=y0, r=x1, t=y1, coord_origin=CoordOrigin.BOTTOMLEFT
|
||||
orig=text_piece,
|
||||
from_ocr=False,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox(
|
||||
l=x0,
|
||||
b=y0,
|
||||
r=x1,
|
||||
t=y1,
|
||||
coord_origin=CoordOrigin.BOTTOMLEFT,
|
||||
)
|
||||
).to_top_left_origin(page_size.height),
|
||||
)
|
||||
)
|
||||
@ -97,51 +108,56 @@ class PyPdfiumPageBackend(PdfPageBackend):
|
||||
# PyPdfium2 produces very fragmented cells, with sub-word level boundaries, in many PDFs.
|
||||
# The cell merging code below is to clean this up.
|
||||
def merge_horizontal_cells(
|
||||
cells: List[Cell],
|
||||
cells: List[TextCell],
|
||||
horizontal_threshold_factor: float = 1.0,
|
||||
vertical_threshold_factor: float = 0.5,
|
||||
) -> List[Cell]:
|
||||
) -> List[TextCell]:
|
||||
if not cells:
|
||||
return []
|
||||
|
||||
def group_rows(cells: List[Cell]) -> List[List[Cell]]:
|
||||
def group_rows(cells: List[TextCell]) -> List[List[TextCell]]:
|
||||
rows = []
|
||||
current_row = [cells[0]]
|
||||
row_top = cells[0].bbox.t
|
||||
row_bottom = cells[0].bbox.b
|
||||
row_height = cells[0].bbox.height
|
||||
row_top = cells[0].rect.to_bounding_box().t
|
||||
row_bottom = cells[0].rect.to_bounding_box().b
|
||||
row_height = cells[0].rect.to_bounding_box().height
|
||||
|
||||
for cell in cells[1:]:
|
||||
vertical_threshold = row_height * vertical_threshold_factor
|
||||
if (
|
||||
abs(cell.bbox.t - row_top) <= vertical_threshold
|
||||
and abs(cell.bbox.b - row_bottom) <= vertical_threshold
|
||||
abs(cell.rect.to_bounding_box().t - row_top)
|
||||
<= vertical_threshold
|
||||
and abs(cell.rect.to_bounding_box().b - row_bottom)
|
||||
<= vertical_threshold
|
||||
):
|
||||
current_row.append(cell)
|
||||
row_top = min(row_top, cell.bbox.t)
|
||||
row_bottom = max(row_bottom, cell.bbox.b)
|
||||
row_top = min(row_top, cell.rect.to_bounding_box().t)
|
||||
row_bottom = max(row_bottom, cell.rect.to_bounding_box().b)
|
||||
row_height = row_bottom - row_top
|
||||
else:
|
||||
rows.append(current_row)
|
||||
current_row = [cell]
|
||||
row_top = cell.bbox.t
|
||||
row_bottom = cell.bbox.b
|
||||
row_height = cell.bbox.height
|
||||
row_top = cell.rect.to_bounding_box().t
|
||||
row_bottom = cell.rect.to_bounding_box().b
|
||||
row_height = cell.rect.to_bounding_box().height
|
||||
|
||||
if current_row:
|
||||
rows.append(current_row)
|
||||
|
||||
return rows
|
||||
|
||||
def merge_row(row: List[Cell]) -> List[Cell]:
|
||||
def merge_row(row: List[TextCell]) -> List[TextCell]:
|
||||
merged = []
|
||||
current_group = [row[0]]
|
||||
|
||||
for cell in row[1:]:
|
||||
prev_cell = current_group[-1]
|
||||
avg_height = (prev_cell.bbox.height + cell.bbox.height) / 2
|
||||
avg_height = (
|
||||
prev_cell.rect.height + cell.rect.to_bounding_box().height
|
||||
) / 2
|
||||
if (
|
||||
cell.bbox.l - prev_cell.bbox.r
|
||||
cell.rect.to_bounding_box().l
|
||||
- prev_cell.rect.to_bounding_box().r
|
||||
<= avg_height * horizontal_threshold_factor
|
||||
):
|
||||
current_group.append(cell)
|
||||
@ -154,24 +170,30 @@ class PyPdfiumPageBackend(PdfPageBackend):
|
||||
|
||||
return merged
|
||||
|
||||
def merge_group(group: List[Cell]) -> Cell:
|
||||
def merge_group(group: List[TextCell]) -> TextCell:
|
||||
if len(group) == 1:
|
||||
return group[0]
|
||||
|
||||
merged_text = "".join(cell.text for cell in group)
|
||||
merged_bbox = BoundingBox(
|
||||
l=min(cell.bbox.l for cell in group),
|
||||
t=min(cell.bbox.t for cell in group),
|
||||
r=max(cell.bbox.r for cell in group),
|
||||
b=max(cell.bbox.b for cell in group),
|
||||
l=min(cell.rect.to_bounding_box().l for cell in group),
|
||||
t=min(cell.rect.to_bounding_box().t for cell in group),
|
||||
r=max(cell.rect.to_bounding_box().r for cell in group),
|
||||
b=max(cell.rect.to_bounding_box().b for cell in group),
|
||||
)
|
||||
return TextCell(
|
||||
index=group[0].index,
|
||||
text=merged_text,
|
||||
orig=merged_text,
|
||||
rect=BoundingRectangle.from_bounding_box(merged_bbox),
|
||||
from_ocr=False,
|
||||
)
|
||||
return Cell(id=group[0].id, text=merged_text, bbox=merged_bbox)
|
||||
|
||||
rows = group_rows(cells)
|
||||
merged_cells = [cell for row in rows for cell in merge_row(row)]
|
||||
|
||||
for i, cell in enumerate(merged_cells, 1):
|
||||
cell.id = i
|
||||
cell.index = i
|
||||
|
||||
return merged_cells
|
||||
|
||||
@ -181,7 +203,7 @@ class PyPdfiumPageBackend(PdfPageBackend):
|
||||
) # make new image to avoid drawing on the saved ones
|
||||
draw = ImageDraw.Draw(image)
|
||||
for c in cells:
|
||||
x0, y0, x1, y1 = c.bbox.as_tuple()
|
||||
x0, y0, x1, y1 = c.rect.to_bounding_box().as_tuple()
|
||||
cell_color = (
|
||||
random.randint(30, 140),
|
||||
random.randint(30, 140),
|
||||
|
@ -9,6 +9,7 @@ import warnings
|
||||
from pathlib import Path
|
||||
from typing import Annotated, Dict, Iterable, List, Optional, Type
|
||||
|
||||
import rich.table
|
||||
import typer
|
||||
from docling_core.types.doc import ImageRefMode
|
||||
from docling_core.utils.file import resolve_source_to_path
|
||||
@ -16,6 +17,7 @@ from pydantic import TypeAdapter
|
||||
|
||||
from docling.backend.docling_parse_backend import DoclingParseDocumentBackend
|
||||
from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend
|
||||
from docling.backend.docling_parse_v4_backend import DoclingParseV4DocumentBackend
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend
|
||||
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
|
||||
from docling.datamodel.base_models import (
|
||||
@ -29,18 +31,14 @@ from docling.datamodel.pipeline_options import (
|
||||
AcceleratorDevice,
|
||||
AcceleratorOptions,
|
||||
EasyOcrOptions,
|
||||
OcrEngine,
|
||||
OcrMacOptions,
|
||||
OcrOptions,
|
||||
PdfBackend,
|
||||
PdfPipelineOptions,
|
||||
RapidOcrOptions,
|
||||
TableFormerMode,
|
||||
TesseractCliOcrOptions,
|
||||
TesseractOcrOptions,
|
||||
)
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.document_converter import DocumentConverter, FormatOption, PdfFormatOption
|
||||
from docling.models.factories import get_ocr_factory
|
||||
|
||||
warnings.filterwarnings(action="ignore", category=UserWarning, module="pydantic|torch")
|
||||
warnings.filterwarnings(action="ignore", category=FutureWarning, module="easyocr")
|
||||
@ -48,8 +46,11 @@ warnings.filterwarnings(action="ignore", category=FutureWarning, module="easyocr
|
||||
_log = logging.getLogger(__name__)
|
||||
from rich.console import Console
|
||||
|
||||
console = Console()
|
||||
err_console = Console(stderr=True)
|
||||
|
||||
ocr_factory_internal = get_ocr_factory(allow_external_plugins=False)
|
||||
ocr_engines_enum_internal = ocr_factory_internal.get_enum()
|
||||
|
||||
app = typer.Typer(
|
||||
name="Docling",
|
||||
@ -77,6 +78,24 @@ def version_callback(value: bool):
|
||||
raise typer.Exit()
|
||||
|
||||
|
||||
def show_external_plugins_callback(value: bool):
|
||||
if value:
|
||||
ocr_factory_all = get_ocr_factory(allow_external_plugins=True)
|
||||
table = rich.table.Table(title="Available OCR engines")
|
||||
table.add_column("Name", justify="right")
|
||||
table.add_column("Plugin")
|
||||
table.add_column("Package")
|
||||
for meta in ocr_factory_all.registered_meta.values():
|
||||
if not meta.module.startswith("docling."):
|
||||
table.add_row(
|
||||
f"[bold]{meta.kind}[/bold]",
|
||||
meta.plugin_name,
|
||||
meta.module.split(".")[0],
|
||||
)
|
||||
rich.print(table)
|
||||
raise typer.Exit()
|
||||
|
||||
|
||||
def export_documents(
|
||||
conv_results: Iterable[ConversionResult],
|
||||
output_dir: Path,
|
||||
@ -195,8 +214,16 @@ def convert(
|
||||
),
|
||||
] = False,
|
||||
ocr_engine: Annotated[
|
||||
OcrEngine, typer.Option(..., help="The OCR engine to use.")
|
||||
] = OcrEngine.EASYOCR,
|
||||
str,
|
||||
typer.Option(
|
||||
...,
|
||||
help=(
|
||||
f"The OCR engine to use. When --allow-external-plugins is *not* set, the available values are: "
|
||||
f"{', '.join((o.value for o in ocr_engines_enum_internal))}. "
|
||||
f"Use the option --show-external-plugins to see the options allowed with external plugins."
|
||||
),
|
||||
),
|
||||
] = EasyOcrOptions.kind,
|
||||
ocr_lang: Annotated[
|
||||
Optional[str],
|
||||
typer.Option(
|
||||
@ -240,6 +267,21 @@ def convert(
|
||||
..., help="Must be enabled when using models connecting to remote services."
|
||||
),
|
||||
] = False,
|
||||
allow_external_plugins: Annotated[
|
||||
bool,
|
||||
typer.Option(
|
||||
..., help="Must be enabled for loading modules from third-party plugins."
|
||||
),
|
||||
] = False,
|
||||
show_external_plugins: Annotated[
|
||||
bool,
|
||||
typer.Option(
|
||||
...,
|
||||
help="List the third-party plugins which are available when the option --allow-external-plugins is set.",
|
||||
callback=show_external_plugins_callback,
|
||||
is_eager=True,
|
||||
),
|
||||
] = False,
|
||||
abort_on_error: Annotated[
|
||||
bool,
|
||||
typer.Option(
|
||||
@ -367,18 +409,11 @@ def convert(
|
||||
export_txt = OutputFormat.TEXT in to_formats
|
||||
export_doctags = OutputFormat.DOCTAGS in to_formats
|
||||
|
||||
if ocr_engine == OcrEngine.EASYOCR:
|
||||
ocr_options: OcrOptions = EasyOcrOptions(force_full_page_ocr=force_ocr)
|
||||
elif ocr_engine == OcrEngine.TESSERACT_CLI:
|
||||
ocr_options = TesseractCliOcrOptions(force_full_page_ocr=force_ocr)
|
||||
elif ocr_engine == OcrEngine.TESSERACT:
|
||||
ocr_options = TesseractOcrOptions(force_full_page_ocr=force_ocr)
|
||||
elif ocr_engine == OcrEngine.OCRMAC:
|
||||
ocr_options = OcrMacOptions(force_full_page_ocr=force_ocr)
|
||||
elif ocr_engine == OcrEngine.RAPIDOCR:
|
||||
ocr_options = RapidOcrOptions(force_full_page_ocr=force_ocr)
|
||||
else:
|
||||
raise RuntimeError(f"Unexpected OCR engine type {ocr_engine}")
|
||||
ocr_factory = get_ocr_factory(allow_external_plugins=allow_external_plugins)
|
||||
ocr_options: OcrOptions = ocr_factory.create_options( # type: ignore
|
||||
kind=ocr_engine,
|
||||
force_full_page_ocr=force_ocr,
|
||||
)
|
||||
|
||||
ocr_lang_list = _split_list(ocr_lang)
|
||||
if ocr_lang_list is not None:
|
||||
@ -386,6 +421,7 @@ def convert(
|
||||
|
||||
accelerator_options = AcceleratorOptions(num_threads=num_threads, device=device)
|
||||
pipeline_options = PdfPipelineOptions(
|
||||
allow_external_plugins=allow_external_plugins,
|
||||
enable_remote_services=enable_remote_services,
|
||||
accelerator_options=accelerator_options,
|
||||
do_ocr=ocr,
|
||||
@ -412,12 +448,15 @@ def convert(
|
||||
if artifacts_path is not None:
|
||||
pipeline_options.artifacts_path = artifacts_path
|
||||
|
||||
backend: Type[PdfDocumentBackend]
|
||||
if pdf_backend == PdfBackend.DLPARSE_V1:
|
||||
backend: Type[PdfDocumentBackend] = DoclingParseDocumentBackend
|
||||
backend = DoclingParseDocumentBackend
|
||||
elif pdf_backend == PdfBackend.DLPARSE_V2:
|
||||
backend = DoclingParseV2DocumentBackend
|
||||
elif pdf_backend == PdfBackend.DLPARSE_V4:
|
||||
backend = DoclingParseV4DocumentBackend # type: ignore
|
||||
elif pdf_backend == PdfBackend.PYPDFIUM2:
|
||||
backend = PyPdfiumDocumentBackend
|
||||
backend = PyPdfiumDocumentBackend # type: ignore
|
||||
else:
|
||||
raise RuntimeError(f"Unexpected PDF backend type {pdf_backend}")
|
||||
|
||||
|
@ -9,6 +9,7 @@ from docling_core.types.doc import (
|
||||
Size,
|
||||
TableCell,
|
||||
)
|
||||
from docling_core.types.doc.page import SegmentedPdfPage, TextCell
|
||||
from docling_core.types.io import ( # DO ΝΟΤ REMOVE; explicitly exposed from this location
|
||||
DocumentStream,
|
||||
)
|
||||
@ -123,14 +124,10 @@ class ErrorItem(BaseModel):
|
||||
error_message: str
|
||||
|
||||
|
||||
class Cell(BaseModel):
|
||||
id: int
|
||||
text: str
|
||||
bbox: BoundingBox
|
||||
|
||||
|
||||
class OcrCell(Cell):
|
||||
confidence: float
|
||||
# class Cell(BaseModel):
|
||||
# id: int
|
||||
# text: str
|
||||
# bbox: BoundingBox
|
||||
|
||||
|
||||
class Cluster(BaseModel):
|
||||
@ -138,7 +135,7 @@ class Cluster(BaseModel):
|
||||
label: DocItemLabel
|
||||
bbox: BoundingBox
|
||||
confidence: float = 1.0
|
||||
cells: List[Cell] = []
|
||||
cells: List[TextCell] = []
|
||||
children: List["Cluster"] = [] # Add child cluster support
|
||||
|
||||
|
||||
@ -226,7 +223,8 @@ class Page(BaseModel):
|
||||
page_no: int
|
||||
# page_hash: Optional[str] = None
|
||||
size: Optional[Size] = None
|
||||
cells: List[Cell] = []
|
||||
cells: List[TextCell] = []
|
||||
parsed_page: Optional[SegmentedPdfPage] = None
|
||||
predictions: PagePredictions = PagePredictions()
|
||||
assembled: Optional[AssembledUnit] = None
|
||||
|
||||
|
@ -1,10 +1,9 @@
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import warnings
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Annotated, Any, Dict, List, Literal, Optional, Union
|
||||
from typing import Any, ClassVar, Dict, List, Literal, Optional, Union
|
||||
|
||||
from pydantic import (
|
||||
AnyUrl,
|
||||
@ -13,13 +12,8 @@ from pydantic import (
|
||||
Field,
|
||||
field_validator,
|
||||
model_validator,
|
||||
validator,
|
||||
)
|
||||
from pydantic_settings import (
|
||||
BaseSettings,
|
||||
PydanticBaseSettingsSource,
|
||||
SettingsConfigDict,
|
||||
)
|
||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||
from typing_extensions import deprecated
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
@ -83,6 +77,12 @@ class AcceleratorOptions(BaseSettings):
|
||||
return data
|
||||
|
||||
|
||||
class BaseOptions(BaseModel):
|
||||
"""Base class for options."""
|
||||
|
||||
kind: ClassVar[str]
|
||||
|
||||
|
||||
class TableFormerMode(str, Enum):
|
||||
"""Modes for the TableFormer model."""
|
||||
|
||||
@ -102,10 +102,9 @@ class TableStructureOptions(BaseModel):
|
||||
mode: TableFormerMode = TableFormerMode.ACCURATE
|
||||
|
||||
|
||||
class OcrOptions(BaseModel):
|
||||
class OcrOptions(BaseOptions):
|
||||
"""OCR options."""
|
||||
|
||||
kind: str
|
||||
lang: List[str]
|
||||
force_full_page_ocr: bool = False # If enabled a full page OCR is always applied
|
||||
bitmap_area_threshold: float = (
|
||||
@ -116,7 +115,7 @@ class OcrOptions(BaseModel):
|
||||
class RapidOcrOptions(OcrOptions):
|
||||
"""Options for the RapidOCR engine."""
|
||||
|
||||
kind: Literal["rapidocr"] = "rapidocr"
|
||||
kind: ClassVar[Literal["rapidocr"]] = "rapidocr"
|
||||
|
||||
# English and chinese are the most commly used models and have been tested with RapidOCR.
|
||||
lang: List[str] = [
|
||||
@ -155,7 +154,7 @@ class RapidOcrOptions(OcrOptions):
|
||||
class EasyOcrOptions(OcrOptions):
|
||||
"""Options for the EasyOCR engine."""
|
||||
|
||||
kind: Literal["easyocr"] = "easyocr"
|
||||
kind: ClassVar[Literal["easyocr"]] = "easyocr"
|
||||
lang: List[str] = ["fr", "de", "es", "en"]
|
||||
|
||||
use_gpu: Optional[bool] = None
|
||||
@ -175,7 +174,7 @@ class EasyOcrOptions(OcrOptions):
|
||||
class TesseractCliOcrOptions(OcrOptions):
|
||||
"""Options for the TesseractCli engine."""
|
||||
|
||||
kind: Literal["tesseract"] = "tesseract"
|
||||
kind: ClassVar[Literal["tesseract"]] = "tesseract"
|
||||
lang: List[str] = ["fra", "deu", "spa", "eng"]
|
||||
tesseract_cmd: str = "tesseract"
|
||||
path: Optional[str] = None
|
||||
@ -188,7 +187,7 @@ class TesseractCliOcrOptions(OcrOptions):
|
||||
class TesseractOcrOptions(OcrOptions):
|
||||
"""Options for the Tesseract engine."""
|
||||
|
||||
kind: Literal["tesserocr"] = "tesserocr"
|
||||
kind: ClassVar[Literal["tesserocr"]] = "tesserocr"
|
||||
lang: List[str] = ["fra", "deu", "spa", "eng"]
|
||||
path: Optional[str] = None
|
||||
|
||||
@ -200,7 +199,7 @@ class TesseractOcrOptions(OcrOptions):
|
||||
class OcrMacOptions(OcrOptions):
|
||||
"""Options for the Mac OCR engine."""
|
||||
|
||||
kind: Literal["ocrmac"] = "ocrmac"
|
||||
kind: ClassVar[Literal["ocrmac"]] = "ocrmac"
|
||||
lang: List[str] = ["fr-FR", "de-DE", "es-ES", "en-US"]
|
||||
recognition: str = "accurate"
|
||||
framework: str = "vision"
|
||||
@ -210,8 +209,7 @@ class OcrMacOptions(OcrOptions):
|
||||
)
|
||||
|
||||
|
||||
class PictureDescriptionBaseOptions(BaseModel):
|
||||
kind: str
|
||||
class PictureDescriptionBaseOptions(BaseOptions):
|
||||
batch_size: int = 8
|
||||
scale: float = 2
|
||||
|
||||
@ -221,7 +219,7 @@ class PictureDescriptionBaseOptions(BaseModel):
|
||||
|
||||
|
||||
class PictureDescriptionApiOptions(PictureDescriptionBaseOptions):
|
||||
kind: Literal["api"] = "api"
|
||||
kind: ClassVar[Literal["api"]] = "api"
|
||||
|
||||
url: AnyUrl = AnyUrl("http://localhost:8000/v1/chat/completions")
|
||||
headers: Dict[str, str] = {}
|
||||
@ -233,7 +231,7 @@ class PictureDescriptionApiOptions(PictureDescriptionBaseOptions):
|
||||
|
||||
|
||||
class PictureDescriptionVlmOptions(PictureDescriptionBaseOptions):
|
||||
kind: Literal["vlm"] = "vlm"
|
||||
kind: ClassVar[Literal["vlm"]] = "vlm"
|
||||
|
||||
repo_id: str
|
||||
prompt: str = "Describe this image in a few sentences."
|
||||
@ -301,9 +299,11 @@ class PdfBackend(str, Enum):
|
||||
PYPDFIUM2 = "pypdfium2"
|
||||
DLPARSE_V1 = "dlparse_v1"
|
||||
DLPARSE_V2 = "dlparse_v2"
|
||||
DLPARSE_V4 = "dlparse_v4"
|
||||
|
||||
|
||||
# Define an enum for the ocr engines
|
||||
@deprecated("Use ocr_factory.registered_enum")
|
||||
class OcrEngine(str, Enum):
|
||||
"""Enum of valid OCR engines."""
|
||||
|
||||
@ -323,6 +323,7 @@ class PipelineOptions(BaseModel):
|
||||
document_timeout: Optional[float] = None
|
||||
accelerator_options: AcceleratorOptions = AcceleratorOptions()
|
||||
enable_remote_services: bool = False
|
||||
allow_external_plugins: bool = False
|
||||
|
||||
|
||||
class PaginatedPipelineOptions(PipelineOptions):
|
||||
@ -358,17 +359,10 @@ class PdfPipelineOptions(PaginatedPipelineOptions):
|
||||
# If True, text from backend will be used instead of generated text
|
||||
|
||||
table_structure_options: TableStructureOptions = TableStructureOptions()
|
||||
ocr_options: Union[
|
||||
EasyOcrOptions,
|
||||
TesseractCliOcrOptions,
|
||||
TesseractOcrOptions,
|
||||
OcrMacOptions,
|
||||
RapidOcrOptions,
|
||||
] = Field(EasyOcrOptions(), discriminator="kind")
|
||||
picture_description_options: Annotated[
|
||||
Union[PictureDescriptionApiOptions, PictureDescriptionVlmOptions],
|
||||
Field(discriminator="kind"),
|
||||
] = smolvlm_picture_description
|
||||
ocr_options: OcrOptions = EasyOcrOptions()
|
||||
picture_description_options: PictureDescriptionBaseOptions = (
|
||||
smolvlm_picture_description
|
||||
)
|
||||
|
||||
images_scale: float = 1.0
|
||||
generate_page_images: bool = False
|
||||
@ -381,3 +375,5 @@ class PdfPipelineOptions(PaginatedPipelineOptions):
|
||||
"before conversion and then use the `TableItem.get_image` function."
|
||||
),
|
||||
)
|
||||
|
||||
generate_parsed_pages: bool = False
|
||||
|
@ -11,7 +11,7 @@ from pydantic import BaseModel, ConfigDict, model_validator, validate_call
|
||||
from docling.backend.abstract_backend import AbstractDocumentBackend
|
||||
from docling.backend.asciidoc_backend import AsciiDocBackend
|
||||
from docling.backend.csv_backend import CsvDocumentBackend
|
||||
from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend
|
||||
from docling.backend.docling_parse_v4_backend import DoclingParseV4DocumentBackend
|
||||
from docling.backend.html_backend import HTMLDocumentBackend
|
||||
from docling.backend.json.docling_json_backend import DoclingJSONBackend
|
||||
from docling.backend.md_backend import MarkdownDocumentBackend
|
||||
@ -109,12 +109,12 @@ class XMLJatsFormatOption(FormatOption):
|
||||
|
||||
class ImageFormatOption(FormatOption):
|
||||
pipeline_cls: Type = StandardPdfPipeline
|
||||
backend: Type[AbstractDocumentBackend] = DoclingParseV2DocumentBackend
|
||||
backend: Type[AbstractDocumentBackend] = DoclingParseV4DocumentBackend
|
||||
|
||||
|
||||
class PdfFormatOption(FormatOption):
|
||||
pipeline_cls: Type = StandardPdfPipeline
|
||||
backend: Type[AbstractDocumentBackend] = DoclingParseV2DocumentBackend
|
||||
backend: Type[AbstractDocumentBackend] = DoclingParseV4DocumentBackend
|
||||
|
||||
|
||||
def _get_default_option(format: InputFormat) -> FormatOption:
|
||||
@ -147,10 +147,10 @@ def _get_default_option(format: InputFormat) -> FormatOption:
|
||||
pipeline_cls=SimplePipeline, backend=JatsDocumentBackend
|
||||
),
|
||||
InputFormat.IMAGE: FormatOption(
|
||||
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV2DocumentBackend
|
||||
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV4DocumentBackend
|
||||
),
|
||||
InputFormat.PDF: FormatOption(
|
||||
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV2DocumentBackend
|
||||
pipeline_cls=StandardPdfPipeline, backend=DoclingParseV4DocumentBackend
|
||||
),
|
||||
InputFormat.JSON_DOCLING: FormatOption(
|
||||
pipeline_cls=SimplePipeline, backend=DoclingJSONBackend
|
||||
|
@ -1,14 +1,22 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Generic, Iterable, Optional
|
||||
from typing import Any, Generic, Iterable, Optional, Protocol, Type
|
||||
|
||||
from docling_core.types.doc import BoundingBox, DocItem, DoclingDocument, NodeItem
|
||||
from typing_extensions import TypeVar
|
||||
|
||||
from docling.datamodel.base_models import ItemAndImageEnrichmentElement, Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import BaseOptions
|
||||
from docling.datamodel.settings import settings
|
||||
|
||||
|
||||
class BaseModelWithOptions(Protocol):
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[BaseOptions]: ...
|
||||
|
||||
def __init__(self, *, options: BaseOptions, **kwargs): ...
|
||||
|
||||
|
||||
class BasePageModel(ABC):
|
||||
@abstractmethod
|
||||
def __call__(
|
||||
|
@ -2,25 +2,33 @@ import copy
|
||||
import logging
|
||||
from abc import abstractmethod
|
||||
from pathlib import Path
|
||||
from typing import Iterable, List
|
||||
from typing import Iterable, List, Optional, Type
|
||||
|
||||
import numpy as np
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, PdfTextCell, TextCell
|
||||
from PIL import Image, ImageDraw
|
||||
from rtree import index
|
||||
from scipy.ndimage import binary_dilation, find_objects, label
|
||||
|
||||
from docling.datamodel.base_models import Cell, OcrCell, Page
|
||||
from docling.datamodel.base_models import Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import OcrOptions
|
||||
from docling.datamodel.pipeline_options import AcceleratorOptions, OcrOptions
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.base_model import BasePageModel
|
||||
from docling.models.base_model import BaseModelWithOptions, BasePageModel
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BaseOcrModel(BasePageModel):
|
||||
def __init__(self, enabled: bool, options: OcrOptions):
|
||||
class BaseOcrModel(BasePageModel, BaseModelWithOptions):
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
enabled: bool,
|
||||
artifacts_path: Optional[Path],
|
||||
options: OcrOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
self.enabled = enabled
|
||||
self.options = options
|
||||
|
||||
@ -104,11 +112,13 @@ class BaseOcrModel(BasePageModel):
|
||||
p.dimension = 2
|
||||
idx = index.Index(properties=p)
|
||||
for i, cell in enumerate(programmatic_cells):
|
||||
idx.insert(i, cell.bbox.as_tuple())
|
||||
idx.insert(i, cell.rect.to_bounding_box().as_tuple())
|
||||
|
||||
def is_overlapping_with_existing_cells(ocr_cell):
|
||||
# Query the R-tree to get overlapping rectangles
|
||||
possible_matches_index = list(idx.intersection(ocr_cell.bbox.as_tuple()))
|
||||
possible_matches_index = list(
|
||||
idx.intersection(ocr_cell.rect.to_bounding_box().as_tuple())
|
||||
)
|
||||
|
||||
return (
|
||||
len(possible_matches_index) > 0
|
||||
@ -125,10 +135,7 @@ class BaseOcrModel(BasePageModel):
|
||||
"""
|
||||
if self.options.force_full_page_ocr:
|
||||
# If a full page OCR is forced, use only the OCR cells
|
||||
cells = [
|
||||
Cell(id=c_ocr.id, text=c_ocr.text, bbox=c_ocr.bbox)
|
||||
for c_ocr in ocr_cells
|
||||
]
|
||||
cells = ocr_cells
|
||||
return cells
|
||||
|
||||
## Remove OCR cells which overlap with programmatic cells.
|
||||
@ -156,7 +163,7 @@ class BaseOcrModel(BasePageModel):
|
||||
|
||||
# Draw OCR and programmatic cells
|
||||
for tc in page.cells:
|
||||
x0, y0, x1, y1 = tc.bbox.as_tuple()
|
||||
x0, y0, x1, y1 = tc.rect.to_bounding_box().as_tuple()
|
||||
y0 *= scale_x
|
||||
y1 *= scale_y
|
||||
x0 *= scale_x
|
||||
@ -165,9 +172,8 @@ class BaseOcrModel(BasePageModel):
|
||||
if y1 <= y0:
|
||||
y1, y0 = y0, y1
|
||||
|
||||
color = "gray"
|
||||
if isinstance(tc, OcrCell):
|
||||
color = "magenta"
|
||||
color = "magenta" if tc.from_ocr else "gray"
|
||||
|
||||
draw.rectangle([(x0, y0), (x1, y1)], outline=color)
|
||||
|
||||
if show:
|
||||
@ -187,3 +193,8 @@ class BaseOcrModel(BasePageModel):
|
||||
self, conv_res: ConversionResult, page_batch: Iterable[Page]
|
||||
) -> Iterable[Page]:
|
||||
pass
|
||||
|
||||
@classmethod
|
||||
@abstractmethod
|
||||
def get_options_type(cls) -> Type[OcrOptions]:
|
||||
pass
|
||||
|
@ -2,17 +2,19 @@ import logging
|
||||
import warnings
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
from typing import Iterable, List, Optional
|
||||
from typing import Iterable, List, Optional, Type
|
||||
|
||||
import numpy
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, TextCell
|
||||
|
||||
from docling.datamodel.base_models import Cell, OcrCell, Page
|
||||
from docling.datamodel.base_models import Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorDevice,
|
||||
AcceleratorOptions,
|
||||
EasyOcrOptions,
|
||||
OcrOptions,
|
||||
)
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.base_ocr_model import BaseOcrModel
|
||||
@ -33,7 +35,12 @@ class EasyOcrModel(BaseOcrModel):
|
||||
options: EasyOcrOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: EasyOcrOptions
|
||||
|
||||
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
|
||||
@ -148,18 +155,22 @@ class EasyOcrModel(BaseOcrModel):
|
||||
del im
|
||||
|
||||
cells = [
|
||||
OcrCell(
|
||||
id=ix,
|
||||
TextCell(
|
||||
index=ix,
|
||||
text=line[1],
|
||||
orig=line[1],
|
||||
from_ocr=True,
|
||||
confidence=line[2],
|
||||
bbox=BoundingBox.from_tuple(
|
||||
coord=(
|
||||
(line[0][0][0] / self.scale) + ocr_rect.l,
|
||||
(line[0][0][1] / self.scale) + ocr_rect.t,
|
||||
(line[0][2][0] / self.scale) + ocr_rect.l,
|
||||
(line[0][2][1] / self.scale) + ocr_rect.t,
|
||||
),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox.from_tuple(
|
||||
coord=(
|
||||
(line[0][0][0] / self.scale) + ocr_rect.l,
|
||||
(line[0][0][1] / self.scale) + ocr_rect.t,
|
||||
(line[0][2][0] / self.scale) + ocr_rect.l,
|
||||
(line[0][2][1] / self.scale) + ocr_rect.t,
|
||||
),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
)
|
||||
),
|
||||
)
|
||||
for ix, line in enumerate(result)
|
||||
@ -175,3 +186,7 @@ class EasyOcrModel(BaseOcrModel):
|
||||
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
|
||||
|
||||
yield page
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[OcrOptions]:
|
||||
return EasyOcrOptions
|
||||
|
27
docling/models/factories/__init__.py
Normal file
27
docling/models/factories/__init__.py
Normal file
@ -0,0 +1,27 @@
|
||||
import logging
|
||||
from functools import lru_cache
|
||||
|
||||
from docling.models.factories.ocr_factory import OcrFactory
|
||||
from docling.models.factories.picture_description_factory import (
|
||||
PictureDescriptionFactory,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_ocr_factory(allow_external_plugins: bool = False) -> OcrFactory:
|
||||
factory = OcrFactory()
|
||||
factory.load_from_plugins(allow_external_plugins=allow_external_plugins)
|
||||
logger.info("Registered ocr engines: %r", factory.registered_kind)
|
||||
return factory
|
||||
|
||||
|
||||
@lru_cache()
|
||||
def get_picture_description_factory(
|
||||
allow_external_plugins: bool = False,
|
||||
) -> PictureDescriptionFactory:
|
||||
factory = PictureDescriptionFactory()
|
||||
factory.load_from_plugins(allow_external_plugins=allow_external_plugins)
|
||||
logger.info("Registered picture descriptions: %r", factory.registered_kind)
|
||||
return factory
|
122
docling/models/factories/base_factory.py
Normal file
122
docling/models/factories/base_factory.py
Normal file
@ -0,0 +1,122 @@
|
||||
import enum
|
||||
import logging
|
||||
from abc import ABCMeta
|
||||
from typing import Generic, Optional, Type, TypeVar
|
||||
|
||||
from pluggy import PluginManager
|
||||
from pydantic import BaseModel
|
||||
|
||||
from docling.datamodel.pipeline_options import BaseOptions
|
||||
from docling.models.base_model import BaseModelWithOptions
|
||||
|
||||
A = TypeVar("A", bound=BaseModelWithOptions)
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class FactoryMeta(BaseModel):
|
||||
kind: str
|
||||
plugin_name: str
|
||||
module: str
|
||||
|
||||
|
||||
class BaseFactory(Generic[A], metaclass=ABCMeta):
|
||||
default_plugin_name = "docling"
|
||||
|
||||
def __init__(self, plugin_attr_name: str, plugin_name=default_plugin_name):
|
||||
self.plugin_name = plugin_name
|
||||
self.plugin_attr_name = plugin_attr_name
|
||||
|
||||
self._classes: dict[Type[BaseOptions], Type[A]] = {}
|
||||
self._meta: dict[Type[BaseOptions], FactoryMeta] = {}
|
||||
|
||||
@property
|
||||
def registered_kind(self) -> list[str]:
|
||||
return list(opt.kind for opt in self._classes.keys())
|
||||
|
||||
def get_enum(self) -> enum.Enum:
|
||||
return enum.Enum(
|
||||
self.plugin_attr_name + "_enum",
|
||||
names={kind: kind for kind in self.registered_kind},
|
||||
type=str,
|
||||
module=__name__,
|
||||
)
|
||||
|
||||
@property
|
||||
def classes(self):
|
||||
return self._classes
|
||||
|
||||
@property
|
||||
def registered_meta(self):
|
||||
return self._meta
|
||||
|
||||
def create_instance(self, options: BaseOptions, **kwargs) -> A:
|
||||
try:
|
||||
_cls = self._classes[type(options)]
|
||||
return _cls(options=options, **kwargs)
|
||||
except KeyError:
|
||||
raise RuntimeError(self._err_msg_on_class_not_found(options.kind))
|
||||
|
||||
def create_options(self, kind: str, *args, **kwargs) -> BaseOptions:
|
||||
for opt_cls, _ in self._classes.items():
|
||||
if opt_cls.kind == kind:
|
||||
return opt_cls(*args, **kwargs)
|
||||
raise RuntimeError(self._err_msg_on_class_not_found(kind))
|
||||
|
||||
def _err_msg_on_class_not_found(self, kind: str):
|
||||
msg = []
|
||||
|
||||
for opt, cls in self._classes.items():
|
||||
msg.append(f"\t{opt.kind!r} => {cls!r}")
|
||||
|
||||
msg_str = "\n".join(msg)
|
||||
|
||||
return f"No class found with the name {kind!r}, known classes are:\n{msg_str}"
|
||||
|
||||
def register(self, cls: Type[A], plugin_name: str, plugin_module_name: str):
|
||||
opt_type = cls.get_options_type()
|
||||
|
||||
if opt_type in self._classes:
|
||||
raise ValueError(
|
||||
f"{opt_type.kind!r} already registered to class {self._classes[opt_type]!r}"
|
||||
)
|
||||
|
||||
self._classes[opt_type] = cls
|
||||
self._meta[opt_type] = FactoryMeta(
|
||||
kind=opt_type.kind, plugin_name=plugin_name, module=plugin_module_name
|
||||
)
|
||||
|
||||
def load_from_plugins(
|
||||
self, plugin_name: Optional[str] = None, allow_external_plugins: bool = False
|
||||
):
|
||||
plugin_name = plugin_name or self.plugin_name
|
||||
|
||||
plugin_manager = PluginManager(plugin_name)
|
||||
plugin_manager.load_setuptools_entrypoints(plugin_name)
|
||||
|
||||
for plugin_name, plugin_module in plugin_manager.list_name_plugin():
|
||||
plugin_module_name = str(plugin_module.__name__) # type: ignore
|
||||
|
||||
if not allow_external_plugins and not plugin_module_name.startswith(
|
||||
"docling."
|
||||
):
|
||||
logger.warning(
|
||||
f"The plugin {plugin_name} will not be loaded because Docling is being executed with allow_external_plugins=false."
|
||||
)
|
||||
continue
|
||||
|
||||
attr = getattr(plugin_module, self.plugin_attr_name, None)
|
||||
|
||||
if callable(attr):
|
||||
logger.info("Loading plugin %r", plugin_name)
|
||||
|
||||
config = attr()
|
||||
self.process_plugin(config, plugin_name, plugin_module_name)
|
||||
|
||||
def process_plugin(self, config, plugin_name: str, plugin_module_name: str):
|
||||
for item in config[self.plugin_attr_name]:
|
||||
try:
|
||||
self.register(item, plugin_name, plugin_module_name)
|
||||
except ValueError:
|
||||
logger.warning("%r already registered", item)
|
11
docling/models/factories/ocr_factory.py
Normal file
11
docling/models/factories/ocr_factory.py
Normal file
@ -0,0 +1,11 @@
|
||||
import logging
|
||||
|
||||
from docling.models.base_ocr_model import BaseOcrModel
|
||||
from docling.models.factories.base_factory import BaseFactory
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class OcrFactory(BaseFactory[BaseOcrModel]):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__("ocr_engines", *args, **kwargs)
|
11
docling/models/factories/picture_description_factory.py
Normal file
11
docling/models/factories/picture_description_factory.py
Normal file
@ -0,0 +1,11 @@
|
||||
import logging
|
||||
|
||||
from docling.models.factories.base_factory import BaseFactory
|
||||
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PictureDescriptionFactory(BaseFactory[PictureDescriptionBaseModel]):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__("picture_description", *args, **kwargs)
|
@ -1,12 +1,19 @@
|
||||
import logging
|
||||
import sys
|
||||
import tempfile
|
||||
from typing import Iterable, Optional, Tuple
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Optional, Tuple, Type
|
||||
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, TextCell
|
||||
|
||||
from docling.datamodel.base_models import OcrCell, Page
|
||||
from docling.datamodel.base_models import Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import OcrMacOptions
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorOptions,
|
||||
OcrMacOptions,
|
||||
OcrOptions,
|
||||
)
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.base_ocr_model import BaseOcrModel
|
||||
from docling.utils.profiling import TimeRecorder
|
||||
@ -15,13 +22,26 @@ _log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class OcrMacModel(BaseOcrModel):
|
||||
def __init__(self, enabled: bool, options: OcrMacOptions):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
def __init__(
|
||||
self,
|
||||
enabled: bool,
|
||||
artifacts_path: Optional[Path],
|
||||
options: OcrMacOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: OcrMacOptions
|
||||
|
||||
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
|
||||
|
||||
if self.enabled:
|
||||
if "darwin" != sys.platform:
|
||||
raise RuntimeError(f"OcrMac is only supported on Mac.")
|
||||
install_errmsg = (
|
||||
"ocrmac is not correctly installed. "
|
||||
"Please install it via `pip install ocrmac` to use this OCR engine. "
|
||||
@ -94,13 +114,17 @@ class OcrMacModel(BaseOcrModel):
|
||||
bottom = y2 / self.scale
|
||||
|
||||
cells.append(
|
||||
OcrCell(
|
||||
id=ix,
|
||||
TextCell(
|
||||
index=ix,
|
||||
text=text,
|
||||
orig=text,
|
||||
from_ocr=True,
|
||||
confidence=confidence,
|
||||
bbox=BoundingBox.from_tuple(
|
||||
coord=(left, top, right, bottom),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox.from_tuple(
|
||||
coord=(left, top, right, bottom),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
)
|
||||
),
|
||||
)
|
||||
)
|
||||
@ -116,3 +140,7 @@ class OcrMacModel(BaseOcrModel):
|
||||
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
|
||||
|
||||
yield page
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[OcrOptions]:
|
||||
return OcrMacOptions
|
||||
|
@ -13,6 +13,7 @@ from docling.utils.profiling import TimeRecorder
|
||||
|
||||
class PagePreprocessingOptions(BaseModel):
|
||||
images_scale: Optional[float]
|
||||
create_parsed_page: bool
|
||||
|
||||
|
||||
class PagePreprocessingModel(BasePageModel):
|
||||
@ -55,6 +56,9 @@ class PagePreprocessingModel(BasePageModel):
|
||||
|
||||
page.cells = list(page._backend.get_text_cells())
|
||||
|
||||
if self.options.create_parsed_page:
|
||||
page.parsed_page = page._backend.get_segmented_page()
|
||||
|
||||
# DEBUG code:
|
||||
def draw_text_boxes(image, cells, show: bool = False):
|
||||
draw = ImageDraw.Draw(image)
|
||||
|
@ -1,13 +1,18 @@
|
||||
import base64
|
||||
import io
|
||||
import logging
|
||||
from typing import Iterable, List, Optional
|
||||
from pathlib import Path
|
||||
from typing import Iterable, List, Optional, Type, Union
|
||||
|
||||
import requests
|
||||
from PIL import Image
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
from docling.datamodel.pipeline_options import PictureDescriptionApiOptions
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorOptions,
|
||||
PictureDescriptionApiOptions,
|
||||
PictureDescriptionBaseOptions,
|
||||
)
|
||||
from docling.exceptions import OperationNotAllowed
|
||||
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
|
||||
|
||||
@ -46,13 +51,25 @@ class ApiResponse(BaseModel):
|
||||
class PictureDescriptionApiModel(PictureDescriptionBaseModel):
|
||||
# elements_batch_size = 4
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[PictureDescriptionBaseOptions]:
|
||||
return PictureDescriptionApiOptions
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
enabled: bool,
|
||||
enable_remote_services: bool,
|
||||
artifacts_path: Optional[Union[Path, str]],
|
||||
options: PictureDescriptionApiOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
enable_remote_services=enable_remote_services,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: PictureDescriptionApiOptions
|
||||
|
||||
if self.enabled:
|
||||
|
@ -1,6 +1,7 @@
|
||||
import logging
|
||||
from abc import abstractmethod
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterable, List, Optional, Union
|
||||
from typing import Any, Iterable, List, Optional, Type, Union
|
||||
|
||||
from docling_core.types.doc import (
|
||||
DoclingDocument,
|
||||
@ -13,20 +14,30 @@ from docling_core.types.doc.document import ( # TODO: move import to docling_co
|
||||
)
|
||||
from PIL import Image
|
||||
|
||||
from docling.datamodel.pipeline_options import PictureDescriptionBaseOptions
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorOptions,
|
||||
PictureDescriptionBaseOptions,
|
||||
)
|
||||
from docling.models.base_model import (
|
||||
BaseItemAndImageEnrichmentModel,
|
||||
BaseModelWithOptions,
|
||||
ItemAndImageEnrichmentElement,
|
||||
)
|
||||
|
||||
|
||||
class PictureDescriptionBaseModel(BaseItemAndImageEnrichmentModel):
|
||||
class PictureDescriptionBaseModel(
|
||||
BaseItemAndImageEnrichmentModel, BaseModelWithOptions
|
||||
):
|
||||
images_scale: float = 2.0
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
enabled: bool,
|
||||
enable_remote_services: bool,
|
||||
artifacts_path: Optional[Union[Path, str]],
|
||||
options: PictureDescriptionBaseOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
self.enabled = enabled
|
||||
self.options = options
|
||||
@ -62,3 +73,8 @@ class PictureDescriptionBaseModel(BaseItemAndImageEnrichmentModel):
|
||||
PictureDescriptionData(text=output, provenance=self.provenance)
|
||||
)
|
||||
yield item
|
||||
|
||||
@classmethod
|
||||
@abstractmethod
|
||||
def get_options_type(cls) -> Type[PictureDescriptionBaseOptions]:
|
||||
pass
|
||||
|
@ -1,10 +1,11 @@
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Optional, Union
|
||||
from typing import Iterable, Optional, Type, Union
|
||||
|
||||
from PIL import Image
|
||||
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorOptions,
|
||||
PictureDescriptionBaseOptions,
|
||||
PictureDescriptionVlmOptions,
|
||||
)
|
||||
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
|
||||
@ -13,14 +14,25 @@ from docling.utils.accelerator_utils import decide_device
|
||||
|
||||
class PictureDescriptionVlmModel(PictureDescriptionBaseModel):
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[PictureDescriptionBaseOptions]:
|
||||
return PictureDescriptionVlmOptions
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
enabled: bool,
|
||||
enable_remote_services: bool,
|
||||
artifacts_path: Optional[Union[Path, str]],
|
||||
options: PictureDescriptionVlmOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
enable_remote_services=enable_remote_services,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: PictureDescriptionVlmOptions
|
||||
|
||||
if self.enabled:
|
||||
|
0
docling/models/plugins/__init__.py
Normal file
0
docling/models/plugins/__init__.py
Normal file
28
docling/models/plugins/defaults.py
Normal file
28
docling/models/plugins/defaults.py
Normal file
@ -0,0 +1,28 @@
|
||||
from docling.models.easyocr_model import EasyOcrModel
|
||||
from docling.models.ocr_mac_model import OcrMacModel
|
||||
from docling.models.picture_description_api_model import PictureDescriptionApiModel
|
||||
from docling.models.picture_description_vlm_model import PictureDescriptionVlmModel
|
||||
from docling.models.rapid_ocr_model import RapidOcrModel
|
||||
from docling.models.tesseract_ocr_cli_model import TesseractOcrCliModel
|
||||
from docling.models.tesseract_ocr_model import TesseractOcrModel
|
||||
|
||||
|
||||
def ocr_engines():
|
||||
return {
|
||||
"ocr_engines": [
|
||||
EasyOcrModel,
|
||||
OcrMacModel,
|
||||
RapidOcrModel,
|
||||
TesseractOcrModel,
|
||||
TesseractOcrCliModel,
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
def picture_description():
|
||||
return {
|
||||
"picture_description": [
|
||||
PictureDescriptionVlmModel,
|
||||
PictureDescriptionApiModel,
|
||||
]
|
||||
}
|
@ -1,14 +1,17 @@
|
||||
import logging
|
||||
from typing import Iterable
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Optional, Type
|
||||
|
||||
import numpy
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, TextCell
|
||||
|
||||
from docling.datamodel.base_models import OcrCell, Page
|
||||
from docling.datamodel.base_models import Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorDevice,
|
||||
AcceleratorOptions,
|
||||
OcrOptions,
|
||||
RapidOcrOptions,
|
||||
)
|
||||
from docling.datamodel.settings import settings
|
||||
@ -23,10 +26,16 @@ class RapidOcrModel(BaseOcrModel):
|
||||
def __init__(
|
||||
self,
|
||||
enabled: bool,
|
||||
artifacts_path: Optional[Path],
|
||||
options: RapidOcrOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: RapidOcrOptions
|
||||
|
||||
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
|
||||
@ -100,18 +109,26 @@ class RapidOcrModel(BaseOcrModel):
|
||||
|
||||
if result is not None:
|
||||
cells = [
|
||||
OcrCell(
|
||||
id=ix,
|
||||
TextCell(
|
||||
index=ix,
|
||||
text=line[1],
|
||||
orig=line[1],
|
||||
confidence=line[2],
|
||||
bbox=BoundingBox.from_tuple(
|
||||
coord=(
|
||||
(line[0][0][0] / self.scale) + ocr_rect.l,
|
||||
(line[0][0][1] / self.scale) + ocr_rect.t,
|
||||
(line[0][2][0] / self.scale) + ocr_rect.l,
|
||||
(line[0][2][1] / self.scale) + ocr_rect.t,
|
||||
),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
from_ocr=True,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox.from_tuple(
|
||||
coord=(
|
||||
(line[0][0][0] / self.scale)
|
||||
+ ocr_rect.l,
|
||||
(line[0][0][1] / self.scale)
|
||||
+ ocr_rect.t,
|
||||
(line[0][2][0] / self.scale)
|
||||
+ ocr_rect.l,
|
||||
(line[0][2][1] / self.scale)
|
||||
+ ocr_rect.t,
|
||||
),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
)
|
||||
),
|
||||
)
|
||||
for ix, line in enumerate(result)
|
||||
@ -126,3 +143,7 @@ class RapidOcrModel(BaseOcrModel):
|
||||
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
|
||||
|
||||
yield page
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[OcrOptions]:
|
||||
return RapidOcrOptions
|
||||
|
@ -5,6 +5,7 @@ from typing import Iterable, Optional, Union
|
||||
|
||||
import numpy
|
||||
from docling_core.types.doc import BoundingBox, DocItemLabel, TableCell
|
||||
from docling_core.types.doc.page import BoundingRectangle
|
||||
from docling_ibm_models.tableformer.data_management.tf_predictor import TFPredictor
|
||||
from PIL import ImageDraw
|
||||
|
||||
@ -129,7 +130,7 @@ class TableStructureModel(BasePageModel):
|
||||
draw.rectangle([(x0, y0), (x1, y1)], outline="red")
|
||||
|
||||
for cell in table_element.cluster.cells:
|
||||
x0, y0, x1, y1 = cell.bbox.as_tuple()
|
||||
x0, y0, x1, y1 = cell.rect.to_bounding_box().as_tuple()
|
||||
x0 *= scale_x
|
||||
x1 *= scale_x
|
||||
y0 *= scale_x
|
||||
@ -223,11 +224,19 @@ class TableStructureModel(BasePageModel):
|
||||
# Only allow non empty stings (spaces) into the cells of a table
|
||||
if len(c.text.strip()) > 0:
|
||||
new_cell = copy.deepcopy(c)
|
||||
new_cell.bbox = new_cell.bbox.scaled(
|
||||
scale=self.scale
|
||||
new_cell.rect = BoundingRectangle.from_bounding_box(
|
||||
new_cell.rect.to_bounding_box().scaled(
|
||||
scale=self.scale
|
||||
)
|
||||
)
|
||||
|
||||
tokens.append(new_cell.model_dump())
|
||||
tokens.append(
|
||||
{
|
||||
"id": new_cell.index,
|
||||
"text": new_cell.text,
|
||||
"bbox": new_cell.rect.to_bounding_box().model_dump(),
|
||||
}
|
||||
)
|
||||
page_input["tokens"] = tokens
|
||||
|
||||
tf_output = self.tf_predictor.multi_table_predict(
|
||||
|
@ -3,15 +3,21 @@ import io
|
||||
import logging
|
||||
import os
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from subprocess import DEVNULL, PIPE, Popen
|
||||
from typing import Iterable, List, Optional, Tuple
|
||||
from typing import Iterable, List, Optional, Tuple, Type
|
||||
|
||||
import pandas as pd
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, TextCell
|
||||
|
||||
from docling.datamodel.base_models import Cell, OcrCell, Page
|
||||
from docling.datamodel.base_models import Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import TesseractCliOcrOptions
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorOptions,
|
||||
OcrOptions,
|
||||
TesseractCliOcrOptions,
|
||||
)
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.base_ocr_model import BaseOcrModel
|
||||
from docling.utils.ocr_utils import map_tesseract_script
|
||||
@ -21,8 +27,19 @@ _log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TesseractOcrCliModel(BaseOcrModel):
|
||||
def __init__(self, enabled: bool, options: TesseractCliOcrOptions):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
def __init__(
|
||||
self,
|
||||
enabled: bool,
|
||||
artifacts_path: Optional[Path],
|
||||
options: TesseractCliOcrOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: TesseractCliOcrOptions
|
||||
|
||||
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
|
||||
@ -228,18 +245,22 @@ class TesseractOcrCliModel(BaseOcrModel):
|
||||
t = b + h
|
||||
r = l + w
|
||||
|
||||
cell = OcrCell(
|
||||
id=ix,
|
||||
cell = TextCell(
|
||||
index=ix,
|
||||
text=text,
|
||||
orig=text,
|
||||
from_ocr=True,
|
||||
confidence=conf / 100.0,
|
||||
bbox=BoundingBox.from_tuple(
|
||||
coord=(
|
||||
(l / self.scale) + ocr_rect.l,
|
||||
(b / self.scale) + ocr_rect.t,
|
||||
(r / self.scale) + ocr_rect.l,
|
||||
(t / self.scale) + ocr_rect.t,
|
||||
),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox.from_tuple(
|
||||
coord=(
|
||||
(l / self.scale) + ocr_rect.l,
|
||||
(b / self.scale) + ocr_rect.t,
|
||||
(r / self.scale) + ocr_rect.l,
|
||||
(t / self.scale) + ocr_rect.t,
|
||||
),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
)
|
||||
),
|
||||
)
|
||||
all_ocr_cells.append(cell)
|
||||
@ -252,3 +273,7 @@ class TesseractOcrCliModel(BaseOcrModel):
|
||||
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
|
||||
|
||||
yield page
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[OcrOptions]:
|
||||
return TesseractCliOcrOptions
|
||||
|
@ -1,11 +1,17 @@
|
||||
import logging
|
||||
from typing import Iterable
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Optional, Type
|
||||
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import BoundingRectangle, TextCell
|
||||
|
||||
from docling.datamodel.base_models import Cell, OcrCell, Page
|
||||
from docling.datamodel.base_models import Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import TesseractOcrOptions
|
||||
from docling.datamodel.pipeline_options import (
|
||||
AcceleratorOptions,
|
||||
OcrOptions,
|
||||
TesseractOcrOptions,
|
||||
)
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.base_ocr_model import BaseOcrModel
|
||||
from docling.utils.ocr_utils import map_tesseract_script
|
||||
@ -15,8 +21,19 @@ _log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TesseractOcrModel(BaseOcrModel):
|
||||
def __init__(self, enabled: bool, options: TesseractOcrOptions):
|
||||
super().__init__(enabled=enabled, options=options)
|
||||
def __init__(
|
||||
self,
|
||||
enabled: bool,
|
||||
artifacts_path: Optional[Path],
|
||||
options: TesseractOcrOptions,
|
||||
accelerator_options: AcceleratorOptions,
|
||||
):
|
||||
super().__init__(
|
||||
enabled=enabled,
|
||||
artifacts_path=artifacts_path,
|
||||
options=options,
|
||||
accelerator_options=accelerator_options,
|
||||
)
|
||||
self.options: TesseractOcrOptions
|
||||
|
||||
self.scale = 3 # multiplier for 72 dpi == 216 dpi.
|
||||
@ -173,13 +190,17 @@ class TesseractOcrModel(BaseOcrModel):
|
||||
top = (box["y"] + box["h"]) / self.scale
|
||||
|
||||
cells.append(
|
||||
OcrCell(
|
||||
id=ix,
|
||||
TextCell(
|
||||
index=ix,
|
||||
text=text,
|
||||
orig=text,
|
||||
from_ocr=True,
|
||||
confidence=confidence,
|
||||
bbox=BoundingBox.from_tuple(
|
||||
coord=(left, top, right, bottom),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
rect=BoundingRectangle.from_bounding_box(
|
||||
BoundingBox.from_tuple(
|
||||
coord=(left, top, right, bottom),
|
||||
origin=CoordOrigin.TOPLEFT,
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
@ -195,3 +216,7 @@ class TesseractOcrModel(BaseOcrModel):
|
||||
self.draw_ocr_rects_and_cells(conv_res, page, ocr_rects)
|
||||
|
||||
yield page
|
||||
|
||||
@classmethod
|
||||
def get_options_type(cls) -> Type[OcrOptions]:
|
||||
return TesseractOcrOptions
|
||||
|
@ -10,16 +10,7 @@ from docling.backend.abstract_backend import AbstractDocumentBackend
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend
|
||||
from docling.datamodel.base_models import AssembledUnit, Page
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import (
|
||||
EasyOcrOptions,
|
||||
OcrMacOptions,
|
||||
PdfPipelineOptions,
|
||||
PictureDescriptionApiOptions,
|
||||
PictureDescriptionVlmOptions,
|
||||
RapidOcrOptions,
|
||||
TesseractCliOcrOptions,
|
||||
TesseractOcrOptions,
|
||||
)
|
||||
from docling.datamodel.pipeline_options import PdfPipelineOptions
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.base_ocr_model import BaseOcrModel
|
||||
from docling.models.code_formula_model import CodeFormulaModel, CodeFormulaModelOptions
|
||||
@ -27,22 +18,16 @@ from docling.models.document_picture_classifier import (
|
||||
DocumentPictureClassifier,
|
||||
DocumentPictureClassifierOptions,
|
||||
)
|
||||
from docling.models.easyocr_model import EasyOcrModel
|
||||
from docling.models.factories import get_ocr_factory, get_picture_description_factory
|
||||
from docling.models.layout_model import LayoutModel
|
||||
from docling.models.ocr_mac_model import OcrMacModel
|
||||
from docling.models.page_assemble_model import PageAssembleModel, PageAssembleOptions
|
||||
from docling.models.page_preprocessing_model import (
|
||||
PagePreprocessingModel,
|
||||
PagePreprocessingOptions,
|
||||
)
|
||||
from docling.models.picture_description_api_model import PictureDescriptionApiModel
|
||||
from docling.models.picture_description_base_model import PictureDescriptionBaseModel
|
||||
from docling.models.picture_description_vlm_model import PictureDescriptionVlmModel
|
||||
from docling.models.rapid_ocr_model import RapidOcrModel
|
||||
from docling.models.readingorder_model import ReadingOrderModel, ReadingOrderOptions
|
||||
from docling.models.table_structure_model import TableStructureModel
|
||||
from docling.models.tesseract_ocr_cli_model import TesseractOcrCliModel
|
||||
from docling.models.tesseract_ocr_model import TesseractOcrModel
|
||||
from docling.pipeline.base_pipeline import PaginatedPipeline
|
||||
from docling.utils.model_downloader import download_models
|
||||
from docling.utils.profiling import ProfilingScope, TimeRecorder
|
||||
@ -78,16 +63,14 @@ class StandardPdfPipeline(PaginatedPipeline):
|
||||
|
||||
self.glm_model = ReadingOrderModel(options=ReadingOrderOptions())
|
||||
|
||||
if (ocr_model := self.get_ocr_model(artifacts_path=artifacts_path)) is None:
|
||||
raise RuntimeError(
|
||||
f"The specified OCR kind is not supported: {pipeline_options.ocr_options.kind}."
|
||||
)
|
||||
ocr_model = self.get_ocr_model(artifacts_path=artifacts_path)
|
||||
|
||||
self.build_pipe = [
|
||||
# Pre-processing
|
||||
PagePreprocessingModel(
|
||||
options=PagePreprocessingOptions(
|
||||
images_scale=pipeline_options.images_scale
|
||||
images_scale=pipeline_options.images_scale,
|
||||
create_parsed_page=pipeline_options.generate_parsed_pages,
|
||||
)
|
||||
),
|
||||
# OCR
|
||||
@ -163,66 +146,30 @@ class StandardPdfPipeline(PaginatedPipeline):
|
||||
output_dir = download_models(output_dir=local_dir, force=force, progress=False)
|
||||
return output_dir
|
||||
|
||||
def get_ocr_model(
|
||||
self, artifacts_path: Optional[Path] = None
|
||||
) -> Optional[BaseOcrModel]:
|
||||
if isinstance(self.pipeline_options.ocr_options, EasyOcrOptions):
|
||||
return EasyOcrModel(
|
||||
enabled=self.pipeline_options.do_ocr,
|
||||
artifacts_path=artifacts_path,
|
||||
options=self.pipeline_options.ocr_options,
|
||||
accelerator_options=self.pipeline_options.accelerator_options,
|
||||
)
|
||||
elif isinstance(self.pipeline_options.ocr_options, TesseractCliOcrOptions):
|
||||
return TesseractOcrCliModel(
|
||||
enabled=self.pipeline_options.do_ocr,
|
||||
options=self.pipeline_options.ocr_options,
|
||||
)
|
||||
elif isinstance(self.pipeline_options.ocr_options, TesseractOcrOptions):
|
||||
return TesseractOcrModel(
|
||||
enabled=self.pipeline_options.do_ocr,
|
||||
options=self.pipeline_options.ocr_options,
|
||||
)
|
||||
elif isinstance(self.pipeline_options.ocr_options, RapidOcrOptions):
|
||||
return RapidOcrModel(
|
||||
enabled=self.pipeline_options.do_ocr,
|
||||
options=self.pipeline_options.ocr_options,
|
||||
accelerator_options=self.pipeline_options.accelerator_options,
|
||||
)
|
||||
elif isinstance(self.pipeline_options.ocr_options, OcrMacOptions):
|
||||
if "darwin" != sys.platform:
|
||||
raise RuntimeError(
|
||||
f"The specified OCR type is only supported on Mac: {self.pipeline_options.ocr_options.kind}."
|
||||
)
|
||||
return OcrMacModel(
|
||||
enabled=self.pipeline_options.do_ocr,
|
||||
options=self.pipeline_options.ocr_options,
|
||||
)
|
||||
return None
|
||||
def get_ocr_model(self, artifacts_path: Optional[Path] = None) -> BaseOcrModel:
|
||||
factory = get_ocr_factory(
|
||||
allow_external_plugins=self.pipeline_options.allow_external_plugins
|
||||
)
|
||||
return factory.create_instance(
|
||||
options=self.pipeline_options.ocr_options,
|
||||
enabled=self.pipeline_options.do_ocr,
|
||||
artifacts_path=artifacts_path,
|
||||
accelerator_options=self.pipeline_options.accelerator_options,
|
||||
)
|
||||
|
||||
def get_picture_description_model(
|
||||
self, artifacts_path: Optional[Path] = None
|
||||
) -> Optional[PictureDescriptionBaseModel]:
|
||||
if isinstance(
|
||||
self.pipeline_options.picture_description_options,
|
||||
PictureDescriptionApiOptions,
|
||||
):
|
||||
return PictureDescriptionApiModel(
|
||||
enabled=self.pipeline_options.do_picture_description,
|
||||
enable_remote_services=self.pipeline_options.enable_remote_services,
|
||||
options=self.pipeline_options.picture_description_options,
|
||||
)
|
||||
elif isinstance(
|
||||
self.pipeline_options.picture_description_options,
|
||||
PictureDescriptionVlmOptions,
|
||||
):
|
||||
return PictureDescriptionVlmModel(
|
||||
enabled=self.pipeline_options.do_picture_description,
|
||||
artifacts_path=artifacts_path,
|
||||
options=self.pipeline_options.picture_description_options,
|
||||
accelerator_options=self.pipeline_options.accelerator_options,
|
||||
)
|
||||
return None
|
||||
factory = get_picture_description_factory(
|
||||
allow_external_plugins=self.pipeline_options.allow_external_plugins
|
||||
)
|
||||
return factory.create_instance(
|
||||
options=self.pipeline_options.picture_description_options,
|
||||
enabled=self.pipeline_options.do_picture_description,
|
||||
enable_remote_services=self.pipeline_options.enable_remote_services,
|
||||
artifacts_path=artifacts_path,
|
||||
accelerator_options=self.pipeline_options.accelerator_options,
|
||||
)
|
||||
|
||||
def initialize_page(self, conv_res: ConversionResult, page: Page) -> Page:
|
||||
with TimeRecorder(conv_res, "page_init"):
|
||||
|
@ -1,41 +1,20 @@
|
||||
import itertools
|
||||
import logging
|
||||
import re
|
||||
import warnings
|
||||
from io import BytesIO
|
||||
|
||||
# from io import BytesIO
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from typing import List, Optional, Union, cast
|
||||
|
||||
from docling_core.types import DoclingDocument
|
||||
from docling_core.types.doc import (
|
||||
BoundingBox,
|
||||
DocItem,
|
||||
DocItemLabel,
|
||||
DoclingDocument,
|
||||
GroupLabel,
|
||||
ImageRef,
|
||||
ImageRefMode,
|
||||
PictureItem,
|
||||
ProvenanceItem,
|
||||
Size,
|
||||
TableCell,
|
||||
TableData,
|
||||
TableItem,
|
||||
)
|
||||
from docling_core.types.doc.tokens import DocumentToken, TableToken
|
||||
# from docling_core.types import DoclingDocument
|
||||
from docling_core.types.doc import BoundingBox, DocItem, ImageRef, PictureItem, TextItem
|
||||
from docling_core.types.doc.document import DocTagsDocument
|
||||
from PIL import Image as PILImage
|
||||
|
||||
from docling.backend.abstract_backend import AbstractDocumentBackend
|
||||
from docling.backend.md_backend import MarkdownDocumentBackend
|
||||
from docling.backend.pdf_backend import PdfDocumentBackend
|
||||
from docling.datamodel.base_models import InputFormat, Page
|
||||
from docling.datamodel.document import ConversionResult, InputDocument
|
||||
from docling.datamodel.pipeline_options import (
|
||||
PdfPipelineOptions,
|
||||
ResponseFormat,
|
||||
VlmPipelineOptions,
|
||||
)
|
||||
from docling.datamodel.pipeline_options import ResponseFormat, VlmPipelineOptions
|
||||
from docling.datamodel.settings import settings
|
||||
from docling.models.hf_vlm_model import HuggingFaceVlmModel
|
||||
from docling.pipeline.base_pipeline import PaginatedPipeline
|
||||
@ -100,6 +79,15 @@ class VlmPipeline(PaginatedPipeline):
|
||||
|
||||
return page
|
||||
|
||||
def extract_text_from_backend(self, page: Page, bbox: BoundingBox | None) -> str:
|
||||
# Convert bounding box normalized to 0-100 into page coordinates for cropping
|
||||
text = ""
|
||||
if bbox:
|
||||
if page.size:
|
||||
if page._backend:
|
||||
text = page._backend.get_text_in_rect(bbox)
|
||||
return text
|
||||
|
||||
def _assemble_document(self, conv_res: ConversionResult) -> ConversionResult:
|
||||
with TimeRecorder(conv_res, "doc_assemble", scope=ProfilingScope.DOCUMENT):
|
||||
|
||||
@ -107,7 +95,45 @@ class VlmPipeline(PaginatedPipeline):
|
||||
self.pipeline_options.vlm_options.response_format
|
||||
== ResponseFormat.DOCTAGS
|
||||
):
|
||||
conv_res.document = self._turn_tags_into_doc(conv_res.pages)
|
||||
doctags_list = []
|
||||
image_list = []
|
||||
for page in conv_res.pages:
|
||||
predicted_doctags = ""
|
||||
img = PILImage.new("RGB", (1, 1), "rgb(255,255,255)")
|
||||
if page.predictions.vlm_response:
|
||||
predicted_doctags = page.predictions.vlm_response.text
|
||||
if page.image:
|
||||
img = page.image
|
||||
image_list.append(img)
|
||||
doctags_list.append(predicted_doctags)
|
||||
|
||||
doctags_list_c = cast(List[Union[Path, str]], doctags_list)
|
||||
image_list_c = cast(List[Union[Path, PILImage.Image]], image_list)
|
||||
doctags_doc = DocTagsDocument.from_doctags_and_image_pairs(
|
||||
doctags_list_c, image_list_c
|
||||
)
|
||||
conv_res.document.load_from_doctags(doctags_doc)
|
||||
|
||||
# If forced backend text, replace model predicted text with backend one
|
||||
if page.size:
|
||||
if self.force_backend_text:
|
||||
scale = self.pipeline_options.images_scale
|
||||
for element, _level in conv_res.document.iterate_items():
|
||||
if (
|
||||
not isinstance(element, TextItem)
|
||||
or len(element.prov) == 0
|
||||
):
|
||||
continue
|
||||
crop_bbox = (
|
||||
element.prov[0]
|
||||
.bbox.scaled(scale=scale)
|
||||
.to_top_left_origin(
|
||||
page_height=page.size.height * scale
|
||||
)
|
||||
)
|
||||
txt = self.extract_text_from_backend(page, crop_bbox)
|
||||
element.text = txt
|
||||
element.orig = txt
|
||||
elif (
|
||||
self.pipeline_options.vlm_options.response_format
|
||||
== ResponseFormat.MARKDOWN
|
||||
@ -165,366 +191,6 @@ class VlmPipeline(PaginatedPipeline):
|
||||
)
|
||||
return backend.convert()
|
||||
|
||||
def _turn_tags_into_doc(self, pages: list[Page]) -> DoclingDocument:
|
||||
###############################################
|
||||
# Tag definitions and color mappings
|
||||
###############################################
|
||||
|
||||
# Maps the recognized tag to a Docling label.
|
||||
# Code items will be given DocItemLabel.CODE
|
||||
tag_to_doclabel = {
|
||||
"title": DocItemLabel.TITLE,
|
||||
"document_index": DocItemLabel.DOCUMENT_INDEX,
|
||||
"otsl": DocItemLabel.TABLE,
|
||||
"section_header_level_1": DocItemLabel.SECTION_HEADER,
|
||||
"checkbox_selected": DocItemLabel.CHECKBOX_SELECTED,
|
||||
"checkbox_unselected": DocItemLabel.CHECKBOX_UNSELECTED,
|
||||
"text": DocItemLabel.TEXT,
|
||||
"page_header": DocItemLabel.PAGE_HEADER,
|
||||
"page_footer": DocItemLabel.PAGE_FOOTER,
|
||||
"formula": DocItemLabel.FORMULA,
|
||||
"caption": DocItemLabel.CAPTION,
|
||||
"picture": DocItemLabel.PICTURE,
|
||||
"list_item": DocItemLabel.LIST_ITEM,
|
||||
"footnote": DocItemLabel.FOOTNOTE,
|
||||
"code": DocItemLabel.CODE,
|
||||
}
|
||||
|
||||
# Maps each tag to an associated bounding box color.
|
||||
tag_to_color = {
|
||||
"title": "blue",
|
||||
"document_index": "darkblue",
|
||||
"otsl": "green",
|
||||
"section_header_level_1": "purple",
|
||||
"checkbox_selected": "black",
|
||||
"checkbox_unselected": "gray",
|
||||
"text": "red",
|
||||
"page_header": "orange",
|
||||
"page_footer": "cyan",
|
||||
"formula": "pink",
|
||||
"caption": "magenta",
|
||||
"picture": "yellow",
|
||||
"list_item": "brown",
|
||||
"footnote": "darkred",
|
||||
"code": "lightblue",
|
||||
}
|
||||
|
||||
def extract_bounding_box(text_chunk: str) -> Optional[BoundingBox]:
|
||||
"""Extracts <loc_...> bounding box coords from the chunk, normalized by / 500."""
|
||||
coords = re.findall(r"<loc_(\d+)>", text_chunk)
|
||||
if len(coords) == 4:
|
||||
l, t, r, b = map(float, coords)
|
||||
return BoundingBox(l=l / 500, t=t / 500, r=r / 500, b=b / 500)
|
||||
return None
|
||||
|
||||
def extract_inner_text(text_chunk: str) -> str:
|
||||
"""Strips all <...> tags inside the chunk to get the raw text content."""
|
||||
return re.sub(r"<.*?>", "", text_chunk, flags=re.DOTALL).strip()
|
||||
|
||||
def extract_text_from_backend(page: Page, bbox: BoundingBox | None) -> str:
|
||||
# Convert bounding box normalized to 0-100 into page coordinates for cropping
|
||||
text = ""
|
||||
if bbox:
|
||||
if page.size:
|
||||
bbox.l = bbox.l * page.size.width
|
||||
bbox.t = bbox.t * page.size.height
|
||||
bbox.r = bbox.r * page.size.width
|
||||
bbox.b = bbox.b * page.size.height
|
||||
if page._backend:
|
||||
text = page._backend.get_text_in_rect(bbox)
|
||||
return text
|
||||
|
||||
def otsl_parse_texts(texts, tokens):
|
||||
split_word = TableToken.OTSL_NL.value
|
||||
split_row_tokens = [
|
||||
list(y)
|
||||
for x, y in itertools.groupby(tokens, lambda z: z == split_word)
|
||||
if not x
|
||||
]
|
||||
table_cells = []
|
||||
r_idx = 0
|
||||
c_idx = 0
|
||||
|
||||
def count_right(tokens, c_idx, r_idx, which_tokens):
|
||||
span = 0
|
||||
c_idx_iter = c_idx
|
||||
while tokens[r_idx][c_idx_iter] in which_tokens:
|
||||
c_idx_iter += 1
|
||||
span += 1
|
||||
if c_idx_iter >= len(tokens[r_idx]):
|
||||
return span
|
||||
return span
|
||||
|
||||
def count_down(tokens, c_idx, r_idx, which_tokens):
|
||||
span = 0
|
||||
r_idx_iter = r_idx
|
||||
while tokens[r_idx_iter][c_idx] in which_tokens:
|
||||
r_idx_iter += 1
|
||||
span += 1
|
||||
if r_idx_iter >= len(tokens):
|
||||
return span
|
||||
return span
|
||||
|
||||
for i, text in enumerate(texts):
|
||||
cell_text = ""
|
||||
if text in [
|
||||
TableToken.OTSL_FCEL.value,
|
||||
TableToken.OTSL_ECEL.value,
|
||||
TableToken.OTSL_CHED.value,
|
||||
TableToken.OTSL_RHED.value,
|
||||
TableToken.OTSL_SROW.value,
|
||||
]:
|
||||
row_span = 1
|
||||
col_span = 1
|
||||
right_offset = 1
|
||||
if text != TableToken.OTSL_ECEL.value:
|
||||
cell_text = texts[i + 1]
|
||||
right_offset = 2
|
||||
|
||||
# Check next element(s) for lcel / ucel / xcel, set properly row_span, col_span
|
||||
next_right_cell = ""
|
||||
if i + right_offset < len(texts):
|
||||
next_right_cell = texts[i + right_offset]
|
||||
|
||||
next_bottom_cell = ""
|
||||
if r_idx + 1 < len(split_row_tokens):
|
||||
if c_idx < len(split_row_tokens[r_idx + 1]):
|
||||
next_bottom_cell = split_row_tokens[r_idx + 1][c_idx]
|
||||
|
||||
if next_right_cell in [
|
||||
TableToken.OTSL_LCEL.value,
|
||||
TableToken.OTSL_XCEL.value,
|
||||
]:
|
||||
# we have horisontal spanning cell or 2d spanning cell
|
||||
col_span += count_right(
|
||||
split_row_tokens,
|
||||
c_idx + 1,
|
||||
r_idx,
|
||||
[TableToken.OTSL_LCEL.value, TableToken.OTSL_XCEL.value],
|
||||
)
|
||||
if next_bottom_cell in [
|
||||
TableToken.OTSL_UCEL.value,
|
||||
TableToken.OTSL_XCEL.value,
|
||||
]:
|
||||
# we have a vertical spanning cell or 2d spanning cell
|
||||
row_span += count_down(
|
||||
split_row_tokens,
|
||||
c_idx,
|
||||
r_idx + 1,
|
||||
[TableToken.OTSL_UCEL.value, TableToken.OTSL_XCEL.value],
|
||||
)
|
||||
|
||||
table_cells.append(
|
||||
TableCell(
|
||||
text=cell_text.strip(),
|
||||
row_span=row_span,
|
||||
col_span=col_span,
|
||||
start_row_offset_idx=r_idx,
|
||||
end_row_offset_idx=r_idx + row_span,
|
||||
start_col_offset_idx=c_idx,
|
||||
end_col_offset_idx=c_idx + col_span,
|
||||
)
|
||||
)
|
||||
if text in [
|
||||
TableToken.OTSL_FCEL.value,
|
||||
TableToken.OTSL_ECEL.value,
|
||||
TableToken.OTSL_CHED.value,
|
||||
TableToken.OTSL_RHED.value,
|
||||
TableToken.OTSL_SROW.value,
|
||||
TableToken.OTSL_LCEL.value,
|
||||
TableToken.OTSL_UCEL.value,
|
||||
TableToken.OTSL_XCEL.value,
|
||||
]:
|
||||
c_idx += 1
|
||||
if text == TableToken.OTSL_NL.value:
|
||||
r_idx += 1
|
||||
c_idx = 0
|
||||
return table_cells, split_row_tokens
|
||||
|
||||
def otsl_extract_tokens_and_text(s: str):
|
||||
# Pattern to match anything enclosed by < > (including the angle brackets themselves)
|
||||
pattern = r"(<[^>]+>)"
|
||||
# Find all tokens (e.g. "<otsl>", "<loc_140>", etc.)
|
||||
tokens = re.findall(pattern, s)
|
||||
# Remove any tokens that start with "<loc_"
|
||||
tokens = [
|
||||
token
|
||||
for token in tokens
|
||||
if not (
|
||||
token.startswith(rf"<{DocumentToken.LOC.value}")
|
||||
or token
|
||||
in [
|
||||
rf"<{DocumentToken.OTSL.value}>",
|
||||
rf"</{DocumentToken.OTSL.value}>",
|
||||
]
|
||||
)
|
||||
]
|
||||
# Split the string by those tokens to get the in-between text
|
||||
text_parts = re.split(pattern, s)
|
||||
text_parts = [
|
||||
token
|
||||
for token in text_parts
|
||||
if not (
|
||||
token.startswith(rf"<{DocumentToken.LOC.value}")
|
||||
or token
|
||||
in [
|
||||
rf"<{DocumentToken.OTSL.value}>",
|
||||
rf"</{DocumentToken.OTSL.value}>",
|
||||
]
|
||||
)
|
||||
]
|
||||
# Remove any empty or purely whitespace strings from text_parts
|
||||
text_parts = [part for part in text_parts if part.strip()]
|
||||
|
||||
return tokens, text_parts
|
||||
|
||||
def parse_table_content(otsl_content: str) -> TableData:
|
||||
tokens, mixed_texts = otsl_extract_tokens_and_text(otsl_content)
|
||||
table_cells, split_row_tokens = otsl_parse_texts(mixed_texts, tokens)
|
||||
|
||||
return TableData(
|
||||
num_rows=len(split_row_tokens),
|
||||
num_cols=(
|
||||
max(len(row) for row in split_row_tokens) if split_row_tokens else 0
|
||||
),
|
||||
table_cells=table_cells,
|
||||
)
|
||||
|
||||
doc = DoclingDocument(name="Document")
|
||||
for pg_idx, page in enumerate(pages):
|
||||
xml_content = ""
|
||||
predicted_text = ""
|
||||
if page.predictions.vlm_response:
|
||||
predicted_text = page.predictions.vlm_response.text
|
||||
image = page.image
|
||||
|
||||
page_no = pg_idx + 1
|
||||
bounding_boxes = []
|
||||
|
||||
if page.size:
|
||||
pg_width = page.size.width
|
||||
pg_height = page.size.height
|
||||
size = Size(width=pg_width, height=pg_height)
|
||||
parent_page = doc.add_page(page_no=page_no, size=size)
|
||||
|
||||
"""
|
||||
1. Finds all <tag>...</tag> blocks in the entire string (multi-line friendly) in the order they appear.
|
||||
2. For each chunk, extracts bounding box (if any) and inner text.
|
||||
3. Adds the item to a DoclingDocument structure with the right label.
|
||||
4. Tracks bounding boxes + color in a separate list for later visualization.
|
||||
"""
|
||||
|
||||
# Regex for all recognized tags
|
||||
tag_pattern = (
|
||||
rf"<(?P<tag>{DocItemLabel.TITLE}|{DocItemLabel.DOCUMENT_INDEX}|"
|
||||
rf"{DocItemLabel.CHECKBOX_UNSELECTED}|{DocItemLabel.CHECKBOX_SELECTED}|"
|
||||
rf"{DocItemLabel.TEXT}|{DocItemLabel.PAGE_HEADER}|"
|
||||
rf"{DocItemLabel.PAGE_FOOTER}|{DocItemLabel.FORMULA}|"
|
||||
rf"{DocItemLabel.CAPTION}|{DocItemLabel.PICTURE}|"
|
||||
rf"{DocItemLabel.LIST_ITEM}|{DocItemLabel.FOOTNOTE}|{DocItemLabel.CODE}|"
|
||||
rf"{DocItemLabel.SECTION_HEADER}_level_1|{DocumentToken.OTSL.value})>.*?</(?P=tag)>"
|
||||
)
|
||||
|
||||
# DocumentToken.OTSL
|
||||
pattern = re.compile(tag_pattern, re.DOTALL)
|
||||
|
||||
# Go through each match in order
|
||||
for match in pattern.finditer(predicted_text):
|
||||
full_chunk = match.group(0)
|
||||
tag_name = match.group("tag")
|
||||
|
||||
bbox = extract_bounding_box(full_chunk)
|
||||
doc_label = tag_to_doclabel.get(tag_name, DocItemLabel.PARAGRAPH)
|
||||
color = tag_to_color.get(tag_name, "white")
|
||||
|
||||
# Store bounding box + color
|
||||
if bbox:
|
||||
bounding_boxes.append((bbox, color))
|
||||
|
||||
if tag_name == DocumentToken.OTSL.value:
|
||||
table_data = parse_table_content(full_chunk)
|
||||
bbox = extract_bounding_box(full_chunk)
|
||||
|
||||
if bbox:
|
||||
prov = ProvenanceItem(
|
||||
bbox=bbox.resize_by_scale(pg_width, pg_height),
|
||||
charspan=(0, 0),
|
||||
page_no=page_no,
|
||||
)
|
||||
doc.add_table(data=table_data, prov=prov)
|
||||
else:
|
||||
doc.add_table(data=table_data)
|
||||
|
||||
elif tag_name == DocItemLabel.PICTURE:
|
||||
text_caption_content = extract_inner_text(full_chunk)
|
||||
if image:
|
||||
if bbox:
|
||||
im_width, im_height = image.size
|
||||
|
||||
crop_box = (
|
||||
int(bbox.l * im_width),
|
||||
int(bbox.t * im_height),
|
||||
int(bbox.r * im_width),
|
||||
int(bbox.b * im_height),
|
||||
)
|
||||
cropped_image = image.crop(crop_box)
|
||||
pic = doc.add_picture(
|
||||
parent=None,
|
||||
image=ImageRef.from_pil(image=cropped_image, dpi=72),
|
||||
prov=(
|
||||
ProvenanceItem(
|
||||
bbox=bbox.resize_by_scale(pg_width, pg_height),
|
||||
charspan=(0, 0),
|
||||
page_no=page_no,
|
||||
)
|
||||
),
|
||||
)
|
||||
# If there is a caption to an image, add it as well
|
||||
if len(text_caption_content) > 0:
|
||||
caption_item = doc.add_text(
|
||||
label=DocItemLabel.CAPTION,
|
||||
text=text_caption_content,
|
||||
parent=None,
|
||||
)
|
||||
pic.captions.append(caption_item.get_ref())
|
||||
else:
|
||||
if bbox:
|
||||
# In case we don't have access to an binary of an image
|
||||
doc.add_picture(
|
||||
parent=None,
|
||||
prov=ProvenanceItem(
|
||||
bbox=bbox, charspan=(0, 0), page_no=page_no
|
||||
),
|
||||
)
|
||||
# If there is a caption to an image, add it as well
|
||||
if len(text_caption_content) > 0:
|
||||
caption_item = doc.add_text(
|
||||
label=DocItemLabel.CAPTION,
|
||||
text=text_caption_content,
|
||||
parent=None,
|
||||
)
|
||||
pic.captions.append(caption_item.get_ref())
|
||||
else:
|
||||
# For everything else, treat as text
|
||||
if self.force_backend_text:
|
||||
text_content = extract_text_from_backend(page, bbox)
|
||||
else:
|
||||
text_content = extract_inner_text(full_chunk)
|
||||
doc.add_text(
|
||||
label=doc_label,
|
||||
text=text_content,
|
||||
prov=(
|
||||
ProvenanceItem(
|
||||
bbox=bbox.resize_by_scale(pg_width, pg_height),
|
||||
charspan=(0, len(text_content)),
|
||||
page_no=page_no,
|
||||
)
|
||||
if bbox
|
||||
else None
|
||||
),
|
||||
)
|
||||
return doc
|
||||
|
||||
@classmethod
|
||||
def get_default_options(cls) -> VlmPipelineOptions:
|
||||
return VlmPipelineOptions()
|
||||
|
@ -2,9 +2,9 @@ import logging
|
||||
from typing import Any, Dict, Iterable, List, Tuple, Union
|
||||
|
||||
from docling_core.types.doc import BoundingBox, CoordOrigin
|
||||
from docling_core.types.doc.page import TextCell
|
||||
from docling_core.types.legacy_doc.base import BaseCell, BaseText, Ref, Table
|
||||
|
||||
from docling.datamodel.base_models import OcrCell
|
||||
from docling.datamodel.document import ConversionResult, Page
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
@ -86,11 +86,13 @@ def generate_multimodal_pages(
|
||||
if page.size is None:
|
||||
return cells
|
||||
for cell in page.cells:
|
||||
new_bbox = cell.bbox.to_top_left_origin(
|
||||
page_height=page.size.height
|
||||
).normalized(page_size=page.size)
|
||||
is_ocr = isinstance(cell, OcrCell)
|
||||
ocr_confidence = cell.confidence if isinstance(cell, OcrCell) else 1.0
|
||||
new_bbox = (
|
||||
cell.rect.to_bounding_box()
|
||||
.to_top_left_origin(page_height=page.size.height)
|
||||
.normalized(page_size=page.size)
|
||||
)
|
||||
is_ocr = cell.from_ocr
|
||||
ocr_confidence = cell.confidence
|
||||
cells.append(
|
||||
{
|
||||
"text": cell.text,
|
||||
|
@ -5,9 +5,10 @@ from collections import defaultdict
|
||||
from typing import Dict, List, Set, Tuple
|
||||
|
||||
from docling_core.types.doc import DocItemLabel, Size
|
||||
from docling_core.types.doc.page import TextCell
|
||||
from rtree import index
|
||||
|
||||
from docling.datamodel.base_models import BoundingBox, Cell, Cluster, OcrCell
|
||||
from docling.datamodel.base_models import BoundingBox, Cluster
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
|
||||
@ -198,7 +199,7 @@ class LayoutPostprocessor:
|
||||
DocItemLabel.TITLE: DocItemLabel.SECTION_HEADER,
|
||||
}
|
||||
|
||||
def __init__(self, cells: List[Cell], clusters: List[Cluster], page_size: Size):
|
||||
def __init__(self, cells: List[TextCell], clusters: List[Cluster], page_size: Size):
|
||||
"""Initialize processor with cells and clusters."""
|
||||
"""Initialize processor with cells and spatial indices."""
|
||||
self.cells = cells
|
||||
@ -218,7 +219,7 @@ class LayoutPostprocessor:
|
||||
[c for c in self.special_clusters if c.label in self.WRAPPER_TYPES]
|
||||
)
|
||||
|
||||
def postprocess(self) -> Tuple[List[Cluster], List[Cell]]:
|
||||
def postprocess(self) -> Tuple[List[Cluster], List[TextCell]]:
|
||||
"""Main processing pipeline."""
|
||||
self.regular_clusters = self._process_regular_clusters()
|
||||
self.special_clusters = self._process_special_clusters()
|
||||
@ -271,15 +272,13 @@ class LayoutPostprocessor:
|
||||
next_id = max((c.id for c in self.all_clusters), default=0) + 1
|
||||
orphan_clusters = []
|
||||
for i, cell in enumerate(unassigned):
|
||||
conf = 1.0
|
||||
if isinstance(cell, OcrCell):
|
||||
conf = cell.confidence
|
||||
conf = cell.confidence
|
||||
|
||||
orphan_clusters.append(
|
||||
Cluster(
|
||||
id=next_id + i,
|
||||
label=DocItemLabel.TEXT,
|
||||
bbox=cell.bbox,
|
||||
bbox=cell.to_bounding_box(),
|
||||
confidence=conf,
|
||||
cells=[cell],
|
||||
)
|
||||
@ -557,13 +556,13 @@ class LayoutPostprocessor:
|
||||
|
||||
return current_best if current_best else clusters[0]
|
||||
|
||||
def _deduplicate_cells(self, cells: List[Cell]) -> List[Cell]:
|
||||
def _deduplicate_cells(self, cells: List[TextCell]) -> List[TextCell]:
|
||||
"""Ensure each cell appears only once, maintaining order of first appearance."""
|
||||
seen_ids = set()
|
||||
unique_cells = []
|
||||
for cell in cells:
|
||||
if cell.id not in seen_ids:
|
||||
seen_ids.add(cell.id)
|
||||
if cell.index not in seen_ids:
|
||||
seen_ids.add(cell.index)
|
||||
unique_cells.append(cell)
|
||||
return unique_cells
|
||||
|
||||
@ -582,11 +581,13 @@ class LayoutPostprocessor:
|
||||
best_cluster = None
|
||||
|
||||
for cluster in clusters:
|
||||
if cell.bbox.area() <= 0:
|
||||
if cell.rect.to_bounding_box().area() <= 0:
|
||||
continue
|
||||
|
||||
overlap = cell.bbox.intersection_area_with(cluster.bbox)
|
||||
overlap_ratio = overlap / cell.bbox.area()
|
||||
overlap = cell.rect.to_bounding_box().intersection_area_with(
|
||||
cluster.bbox
|
||||
)
|
||||
overlap_ratio = overlap / cell.rect.to_bounding_box().area()
|
||||
|
||||
if overlap_ratio > best_overlap:
|
||||
best_overlap = overlap_ratio
|
||||
@ -601,11 +602,13 @@ class LayoutPostprocessor:
|
||||
|
||||
return clusters
|
||||
|
||||
def _find_unassigned_cells(self, clusters: List[Cluster]) -> List[Cell]:
|
||||
def _find_unassigned_cells(self, clusters: List[Cluster]) -> List[TextCell]:
|
||||
"""Find cells not assigned to any cluster."""
|
||||
assigned = {cell.id for cluster in clusters for cell in cluster.cells}
|
||||
assigned = {cell.index for cluster in clusters for cell in cluster.cells}
|
||||
return [
|
||||
cell for cell in self.cells if cell.id not in assigned and cell.text.strip()
|
||||
cell
|
||||
for cell in self.cells
|
||||
if cell.index not in assigned and cell.text.strip()
|
||||
]
|
||||
|
||||
def _adjust_cluster_bboxes(self, clusters: List[Cluster]) -> List[Cluster]:
|
||||
@ -615,10 +618,10 @@ class LayoutPostprocessor:
|
||||
continue
|
||||
|
||||
cells_bbox = BoundingBox(
|
||||
l=min(cell.bbox.l for cell in cluster.cells),
|
||||
t=min(cell.bbox.t for cell in cluster.cells),
|
||||
r=max(cell.bbox.r for cell in cluster.cells),
|
||||
b=max(cell.bbox.b for cell in cluster.cells),
|
||||
l=min(cell.rect.to_bounding_box().l for cell in cluster.cells),
|
||||
t=min(cell.rect.to_bounding_box().t for cell in cluster.cells),
|
||||
r=max(cell.rect.to_bounding_box().r for cell in cluster.cells),
|
||||
b=max(cell.rect.to_bounding_box().b for cell in cluster.cells),
|
||||
)
|
||||
|
||||
if cluster.label == DocItemLabel.TABLE:
|
||||
@ -634,9 +637,9 @@ class LayoutPostprocessor:
|
||||
|
||||
return clusters
|
||||
|
||||
def _sort_cells(self, cells: List[Cell]) -> List[Cell]:
|
||||
def _sort_cells(self, cells: List[TextCell]) -> List[TextCell]:
|
||||
"""Sort cells in native reading order."""
|
||||
return sorted(cells, key=lambda c: (c.id))
|
||||
return sorted(cells, key=lambda c: (c.index))
|
||||
|
||||
def _sort_clusters(
|
||||
self, clusters: List[Cluster], mode: str = "id"
|
||||
@ -647,7 +650,7 @@ class LayoutPostprocessor:
|
||||
clusters,
|
||||
key=lambda cluster: (
|
||||
(
|
||||
min(cell.id for cell in cluster.cells)
|
||||
min(cell.index for cell in cluster.cells)
|
||||
if cluster.cells
|
||||
else sys.maxsize
|
||||
),
|
||||
|
@ -25,7 +25,7 @@ def draw_clusters(
|
||||
# Draw cells first (underneath)
|
||||
cell_color = (0, 0, 0, 40) # Transparent black for cells
|
||||
for tc in c.cells:
|
||||
cx0, cy0, cx1, cy1 = tc.bbox.as_tuple()
|
||||
cx0, cy0, cx1, cy1 = tc.rect.to_bounding_box().as_tuple()
|
||||
cx0 *= scale_x
|
||||
cx1 *= scale_x
|
||||
cy0 *= scale_x
|
||||
|
@ -7,6 +7,7 @@ from typing import Iterable
|
||||
import yaml
|
||||
from docling_core.types.doc import ImageRefMode
|
||||
|
||||
from docling.backend.docling_parse_v4_backend import DoclingParseV4DocumentBackend
|
||||
from docling.datamodel.base_models import ConversionStatus, InputFormat
|
||||
from docling.datamodel.document import ConversionResult
|
||||
from docling.datamodel.pipeline_options import PdfPipelineOptions
|
||||
@ -60,6 +61,18 @@ def export_documents(
|
||||
with (output_dir / f"{doc_filename}.yaml").open("w") as fp:
|
||||
fp.write(yaml.safe_dump(conv_res.document.export_to_dict()))
|
||||
|
||||
# Export Docling document format to doctags:
|
||||
with (output_dir / f"{doc_filename}.doctags.txt").open("w") as fp:
|
||||
fp.write(conv_res.document.export_to_document_tokens())
|
||||
|
||||
# Export Docling document format to markdown:
|
||||
with (output_dir / f"{doc_filename}.md").open("w") as fp:
|
||||
fp.write(conv_res.document.export_to_markdown())
|
||||
|
||||
# Export Docling document format to text:
|
||||
with (output_dir / f"{doc_filename}.txt").open("w") as fp:
|
||||
fp.write(conv_res.document.export_to_markdown(strict_text=True))
|
||||
|
||||
if USE_LEGACY:
|
||||
# Export Deep Search document JSON format:
|
||||
with (output_dir / f"{doc_filename}.legacy.json").open(
|
||||
@ -131,7 +144,9 @@ def main():
|
||||
|
||||
doc_converter = DocumentConverter(
|
||||
format_options={
|
||||
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
|
||||
InputFormat.PDF: PdfFormatOption(
|
||||
pipeline_options=pipeline_options, backend=DoclingParseV4DocumentBackend
|
||||
)
|
||||
}
|
||||
)
|
||||
|
||||
|
@ -13,6 +13,7 @@
|
||||
[](https://github.com/pre-commit/pre-commit)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://pepy.tech/projects/docling)
|
||||
[](https://lfaidata.foundation/projects/)
|
||||
|
||||
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
|
||||
|
||||
@ -25,12 +26,12 @@ Docling simplifies document processing, parsing diverse formats — including ad
|
||||
* 🔒 Local execution capabilities for sensitive data and air-gapped environments
|
||||
* 🤖 Plug-and-play [integrations][integrations] incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
|
||||
* 🔍 Extensive OCR support for scanned PDFs and images
|
||||
* 🥚 Support of Visual Language Models ([SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview))
|
||||
* 💻 Simple and convenient CLI
|
||||
|
||||
### Coming soon
|
||||
|
||||
* 📝 Metadata extraction, including title, authors, references & language
|
||||
* 📝 Inclusion of Visual Language Models ([SmolDocling](https://huggingface.co/blog/smolervlm#smoldocling))
|
||||
* 📝 Chart understanding (Barchart, Piechart, LinePlot, etc)
|
||||
* 📝 Complex chemistry understanding (Molecular structures)
|
||||
|
||||
@ -43,9 +44,13 @@ Docling simplifies document processing, parsing diverse formats — including ad
|
||||
<a href="reference/document_converter/" class="card"><b>Reference</b><br />See more API details</a>
|
||||
</div>
|
||||
|
||||
## IBM ❤️ Open Source AI
|
||||
## LF AI & Data
|
||||
|
||||
Docling has been brought to you by IBM.
|
||||
Docling is hosted as a project in the [LF AI & Data Foundation](https://lfaidata.foundation/projects/).
|
||||
|
||||
### IBM ❤️ Open Source AI
|
||||
|
||||
The project was started by the AI for knowledge team at IBM Research Zurich.
|
||||
|
||||
[supported_formats]: ./usage/supported_formats.md
|
||||
[docling_document]: ./concepts/docling_document.md
|
||||
|
35
docs/integrations/apify.md
Normal file
35
docs/integrations/apify.md
Normal file
@ -0,0 +1,35 @@
|
||||
You can run Docling in the cloud without installation using the [Docling Actor][apify] on Apify platform. Simply provide a document URL and get the processed result:
|
||||
|
||||
<a href="https://apify.com/vancura/docling?fpr=docling"><img src="https://apify.com/ext/run-on-apify.png" alt="Run Docling Actor on Apify" width="176" height="39" /></a>
|
||||
|
||||
```bash
|
||||
apify call vancura/docling -i '{
|
||||
"options": {
|
||||
"to_formats": ["md", "json", "html", "text", "doctags"]
|
||||
},
|
||||
"http_sources": [
|
||||
{"url": "https://vancura.dev/assets/actor-test/facial-hairstyles-and-filtering-facepiece-respirators.pdf"},
|
||||
{"url": "https://arxiv.org/pdf/2408.09869"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
The Actor stores results in:
|
||||
|
||||
* Processed document in key-value store (`OUTPUT_RESULT`)
|
||||
* Processing logs (`DOCLING_LOG`)
|
||||
* Dataset record with result URL and status
|
||||
|
||||
Read more about the [Docling Actor](.actor/README.md), including how to use it via the Apify API and CLI.
|
||||
|
||||
- 💻 [GitHub][github]
|
||||
- 📖 [Docs][docs]
|
||||
- 📦 [Docling Actor][apify]
|
||||
|
||||
[github]: https://github.com/docling-project/docling/tree/main/.actor/
|
||||
[docs]: https://github.com/docling-project/docling/tree/main/.actor/README.md
|
||||
[apify]: https://apify.com/vancura/docling?fpr=docling
|
||||
|
||||
|
||||
|
||||
|
@ -8,7 +8,7 @@ The following table provides an overview of the default enrichment models availa
|
||||
| ------- | --------- | ---------------| ----------- |
|
||||
| Code understanding | `do_code_enrichment` | `CodeItem` | See [docs below](#code-understanding). |
|
||||
| Formula understanding | `do_formula_enrichment` | `TextItem` with label `FORMULA` | See [docs below](#formula-understanding). |
|
||||
| Picrure classification | `do_picture_classification` | `PictureItem` | See [docs below](#picture-classification). |
|
||||
| Picture classification | `do_picture_classification` | `PictureItem` | See [docs below](#picture-classification). |
|
||||
| Picture description | `do_picture_description` | `PictureItem` | See [docs below](#picture-description). |
|
||||
|
||||
|
||||
|
@ -111,6 +111,7 @@ nav:
|
||||
- "LlamaIndex": integrations/llamaindex.md
|
||||
- "txtai": integrations/txtai.md
|
||||
- ⭐️ Featured:
|
||||
- "Apify": integrations/apify.md
|
||||
- "Data Prep Kit": integrations/data_prep_kit.md
|
||||
- "InstructLab": integrations/instructlab.md
|
||||
- "NVIDIA": integrations/nvidia.md
|
||||
|
603
poetry.lock
generated
603
poetry.lock
generated
@ -2,17 +2,17 @@
|
||||
|
||||
[[package]]
|
||||
name = "accelerate"
|
||||
version = "1.4.0"
|
||||
version = "1.5.1"
|
||||
description = "Accelerate"
|
||||
optional = true
|
||||
python-versions = ">=3.9.0"
|
||||
files = [
|
||||
{file = "accelerate-1.4.0-py3-none-any.whl", hash = "sha256:f6e1e7dfaf9d799a20a1dc45efbf4b1546163eac133faa5acd0d89177c896e55"},
|
||||
{file = "accelerate-1.4.0.tar.gz", hash = "sha256:37d413e1b64cb8681ccd2908ae211cf73e13e6e636a2f598a96eccaa538773a5"},
|
||||
{file = "accelerate-1.5.1-py3-none-any.whl", hash = "sha256:4838cff9ed1bb0ddc9d967530ced62a1d74ea21cdb57688400359ab32682f03e"},
|
||||
{file = "accelerate-1.5.1.tar.gz", hash = "sha256:5d936faf3a31894c6160f2f2a984a38aecbba760ef919ae298b2ecd57ea9bf87"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
huggingface-hub = ">=0.21.0"
|
||||
huggingface_hub = ">=0.21.0"
|
||||
numpy = ">=1.17,<3.0.0"
|
||||
packaging = ">=20.0"
|
||||
psutil = "*"
|
||||
@ -22,24 +22,24 @@ torch = ">=2.0.0"
|
||||
|
||||
[package.extras]
|
||||
deepspeed = ["deepspeed"]
|
||||
dev = ["bitsandbytes", "black (>=23.1,<24.0)", "datasets", "diffusers", "evaluate", "hf-doc-builder (>=0.3.0)", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-subtests", "pytest-xdist", "rich", "ruff (>=0.6.4,<0.7.0)", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
|
||||
dev = ["bitsandbytes", "black (>=23.1,<24.0)", "datasets", "diffusers", "evaluate", "hf-doc-builder (>=0.3.0)", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-order", "pytest-subtests", "pytest-xdist", "rich", "ruff (>=0.6.4,<0.7.0)", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
|
||||
quality = ["black (>=23.1,<24.0)", "hf-doc-builder (>=0.3.0)", "ruff (>=0.6.4,<0.7.0)"]
|
||||
rich = ["rich"]
|
||||
sagemaker = ["sagemaker"]
|
||||
test-dev = ["bitsandbytes", "datasets", "diffusers", "evaluate", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
|
||||
test-prod = ["parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-subtests", "pytest-xdist"]
|
||||
test-prod = ["parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-order", "pytest-subtests", "pytest-xdist"]
|
||||
test-trackers = ["comet-ml", "dvclive", "tensorboard", "wandb"]
|
||||
testing = ["bitsandbytes", "datasets", "diffusers", "evaluate", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-subtests", "pytest-xdist", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
|
||||
testing = ["bitsandbytes", "datasets", "diffusers", "evaluate", "parameterized", "pytest (>=7.2.0,<=8.0.0)", "pytest-order", "pytest-subtests", "pytest-xdist", "scikit-learn", "scipy", "timm", "torchdata (>=0.8.0)", "torchpippy (>=0.2.0)", "tqdm", "transformers"]
|
||||
|
||||
[[package]]
|
||||
name = "aiohappyeyeballs"
|
||||
version = "2.4.8"
|
||||
version = "2.6.1"
|
||||
description = "Happy Eyeballs for asyncio"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "aiohappyeyeballs-2.4.8-py3-none-any.whl", hash = "sha256:6cac4f5dd6e34a9644e69cf9021ef679e4394f54e58a183056d12009e42ea9e3"},
|
||||
{file = "aiohappyeyeballs-2.4.8.tar.gz", hash = "sha256:19728772cb12263077982d2f55453babd8bec6a052a926cd5c0c42796da8bf62"},
|
||||
{file = "aiohappyeyeballs-2.6.1-py3-none-any.whl", hash = "sha256:f349ba8f4b75cb25c99c5c2d84e997e485204d2902a9597802b0371f09331fb8"},
|
||||
{file = "aiohappyeyeballs-2.6.1.tar.gz", hash = "sha256:c3f9d0113123803ccadfdf3f0faa505bc78e6a72d1cc4806cbd719826e943558"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -250,20 +250,20 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "attrs"
|
||||
version = "25.1.0"
|
||||
version = "25.3.0"
|
||||
description = "Classes Without Boilerplate"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
files = [
|
||||
{file = "attrs-25.1.0-py3-none-any.whl", hash = "sha256:c75a69e28a550a7e93789579c22aa26b0f5b83b75dc4e08fe092980051e1090a"},
|
||||
{file = "attrs-25.1.0.tar.gz", hash = "sha256:1c97078a80c814273a76b2a298a932eb681c87415c11dee0a6921de7f1b02c3e"},
|
||||
{file = "attrs-25.3.0-py3-none-any.whl", hash = "sha256:427318ce031701fea540783410126f03899a97ffc6f61596ad581ac2e40e3bc3"},
|
||||
{file = "attrs-25.3.0.tar.gz", hash = "sha256:75d7cefc7fb576747b2c81b4442d4d4a1ce0900973527c011d1030fd3bf4af1b"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
benchmark = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-codspeed", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
|
||||
cov = ["cloudpickle", "coverage[toml] (>=5.3)", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
|
||||
dev = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pre-commit-uv", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
|
||||
docs = ["cogapp", "furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphinxcontrib-towncrier", "towncrier (<24.7)"]
|
||||
docs = ["cogapp", "furo", "myst-parser", "sphinx", "sphinx-notfound-page", "sphinxcontrib-towncrier", "towncrier"]
|
||||
tests = ["cloudpickle", "hypothesis", "mypy (>=1.11.1)", "pympler", "pytest (>=4.3.0)", "pytest-mypy-plugins", "pytest-xdist[psutil]"]
|
||||
tests-mypy = ["mypy (>=1.11.1)", "pytest-mypy-plugins"]
|
||||
|
||||
@ -787,37 +787,37 @@ vision = ["Pillow (>=9.4.0)"]
|
||||
|
||||
[[package]]
|
||||
name = "debugpy"
|
||||
version = "1.8.12"
|
||||
version = "1.8.13"
|
||||
description = "An implementation of the Debug Adapter Protocol for Python"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
files = [
|
||||
{file = "debugpy-1.8.12-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:a2ba7ffe58efeae5b8fad1165357edfe01464f9aef25e814e891ec690e7dd82a"},
|
||||
{file = "debugpy-1.8.12-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cbbd4149c4fc5e7d508ece083e78c17442ee13b0e69bfa6bd63003e486770f45"},
|
||||
{file = "debugpy-1.8.12-cp310-cp310-win32.whl", hash = "sha256:b202f591204023b3ce62ff9a47baa555dc00bb092219abf5caf0e3718ac20e7c"},
|
||||
{file = "debugpy-1.8.12-cp310-cp310-win_amd64.whl", hash = "sha256:9649eced17a98ce816756ce50433b2dd85dfa7bc92ceb60579d68c053f98dff9"},
|
||||
{file = "debugpy-1.8.12-cp311-cp311-macosx_14_0_universal2.whl", hash = "sha256:36f4829839ef0afdfdd208bb54f4c3d0eea86106d719811681a8627ae2e53dd5"},
|
||||
{file = "debugpy-1.8.12-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a28ed481d530e3138553be60991d2d61103ce6da254e51547b79549675f539b7"},
|
||||
{file = "debugpy-1.8.12-cp311-cp311-win32.whl", hash = "sha256:4ad9a94d8f5c9b954e0e3b137cc64ef3f579d0df3c3698fe9c3734ee397e4abb"},
|
||||
{file = "debugpy-1.8.12-cp311-cp311-win_amd64.whl", hash = "sha256:4703575b78dd697b294f8c65588dc86874ed787b7348c65da70cfc885efdf1e1"},
|
||||
{file = "debugpy-1.8.12-cp312-cp312-macosx_14_0_universal2.whl", hash = "sha256:7e94b643b19e8feb5215fa508aee531387494bf668b2eca27fa769ea11d9f498"},
|
||||
{file = "debugpy-1.8.12-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:086b32e233e89a2740c1615c2f775c34ae951508b28b308681dbbb87bba97d06"},
|
||||
{file = "debugpy-1.8.12-cp312-cp312-win32.whl", hash = "sha256:2ae5df899732a6051b49ea2632a9ea67f929604fd2b036613a9f12bc3163b92d"},
|
||||
{file = "debugpy-1.8.12-cp312-cp312-win_amd64.whl", hash = "sha256:39dfbb6fa09f12fae32639e3286112fc35ae976114f1f3d37375f3130a820969"},
|
||||
{file = "debugpy-1.8.12-cp313-cp313-macosx_14_0_universal2.whl", hash = "sha256:696d8ae4dff4cbd06bf6b10d671e088b66669f110c7c4e18a44c43cf75ce966f"},
|
||||
{file = "debugpy-1.8.12-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:898fba72b81a654e74412a67c7e0a81e89723cfe2a3ea6fcd3feaa3395138ca9"},
|
||||
{file = "debugpy-1.8.12-cp313-cp313-win32.whl", hash = "sha256:22a11c493c70413a01ed03f01c3c3a2fc4478fc6ee186e340487b2edcd6f4180"},
|
||||
{file = "debugpy-1.8.12-cp313-cp313-win_amd64.whl", hash = "sha256:fdb3c6d342825ea10b90e43d7f20f01535a72b3a1997850c0c3cefa5c27a4a2c"},
|
||||
{file = "debugpy-1.8.12-cp38-cp38-macosx_14_0_x86_64.whl", hash = "sha256:b0232cd42506d0c94f9328aaf0d1d0785f90f87ae72d9759df7e5051be039738"},
|
||||
{file = "debugpy-1.8.12-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9af40506a59450f1315168d47a970db1a65aaab5df3833ac389d2899a5d63b3f"},
|
||||
{file = "debugpy-1.8.12-cp38-cp38-win32.whl", hash = "sha256:5cc45235fefac57f52680902b7d197fb2f3650112379a6fa9aa1b1c1d3ed3f02"},
|
||||
{file = "debugpy-1.8.12-cp38-cp38-win_amd64.whl", hash = "sha256:557cc55b51ab2f3371e238804ffc8510b6ef087673303890f57a24195d096e61"},
|
||||
{file = "debugpy-1.8.12-cp39-cp39-macosx_14_0_x86_64.whl", hash = "sha256:b5c6c967d02fee30e157ab5227706f965d5c37679c687b1e7bbc5d9e7128bd41"},
|
||||
{file = "debugpy-1.8.12-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:88a77f422f31f170c4b7e9ca58eae2a6c8e04da54121900651dfa8e66c29901a"},
|
||||
{file = "debugpy-1.8.12-cp39-cp39-win32.whl", hash = "sha256:a4042edef80364239f5b7b5764e55fd3ffd40c32cf6753da9bda4ff0ac466018"},
|
||||
{file = "debugpy-1.8.12-cp39-cp39-win_amd64.whl", hash = "sha256:f30b03b0f27608a0b26c75f0bb8a880c752c0e0b01090551b9d87c7d783e2069"},
|
||||
{file = "debugpy-1.8.12-py2.py3-none-any.whl", hash = "sha256:274b6a2040349b5c9864e475284bce5bb062e63dce368a394b8cc865ae3b00c6"},
|
||||
{file = "debugpy-1.8.12.tar.gz", hash = "sha256:646530b04f45c830ceae8e491ca1c9320a2d2f0efea3141487c82130aba70dce"},
|
||||
{file = "debugpy-1.8.13-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:06859f68e817966723ffe046b896b1bd75c665996a77313370336ee9e1de3e90"},
|
||||
{file = "debugpy-1.8.13-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cb56c2db69fb8df3168bc857d7b7d2494fed295dfdbde9a45f27b4b152f37520"},
|
||||
{file = "debugpy-1.8.13-cp310-cp310-win32.whl", hash = "sha256:46abe0b821cad751fc1fb9f860fb2e68d75e2c5d360986d0136cd1db8cad4428"},
|
||||
{file = "debugpy-1.8.13-cp310-cp310-win_amd64.whl", hash = "sha256:dc7b77f5d32674686a5f06955e4b18c0e41fb5a605f5b33cf225790f114cfeec"},
|
||||
{file = "debugpy-1.8.13-cp311-cp311-macosx_14_0_universal2.whl", hash = "sha256:eee02b2ed52a563126c97bf04194af48f2fe1f68bb522a312b05935798e922ff"},
|
||||
{file = "debugpy-1.8.13-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4caca674206e97c85c034c1efab4483f33971d4e02e73081265ecb612af65377"},
|
||||
{file = "debugpy-1.8.13-cp311-cp311-win32.whl", hash = "sha256:7d9a05efc6973b5aaf076d779cf3a6bbb1199e059a17738a2aa9d27a53bcc888"},
|
||||
{file = "debugpy-1.8.13-cp311-cp311-win_amd64.whl", hash = "sha256:62f9b4a861c256f37e163ada8cf5a81f4c8d5148fc17ee31fb46813bd658cdcc"},
|
||||
{file = "debugpy-1.8.13-cp312-cp312-macosx_14_0_universal2.whl", hash = "sha256:2b8de94c5c78aa0d0ed79023eb27c7c56a64c68217d881bee2ffbcb13951d0c1"},
|
||||
{file = "debugpy-1.8.13-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:887d54276cefbe7290a754424b077e41efa405a3e07122d8897de54709dbe522"},
|
||||
{file = "debugpy-1.8.13-cp312-cp312-win32.whl", hash = "sha256:3872ce5453b17837ef47fb9f3edc25085ff998ce63543f45ba7af41e7f7d370f"},
|
||||
{file = "debugpy-1.8.13-cp312-cp312-win_amd64.whl", hash = "sha256:63ca7670563c320503fea26ac688988d9d6b9c6a12abc8a8cf2e7dd8e5f6b6ea"},
|
||||
{file = "debugpy-1.8.13-cp313-cp313-macosx_14_0_universal2.whl", hash = "sha256:31abc9618be4edad0b3e3a85277bc9ab51a2d9f708ead0d99ffb5bb750e18503"},
|
||||
{file = "debugpy-1.8.13-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a0bd87557f97bced5513a74088af0b84982b6ccb2e254b9312e29e8a5c4270eb"},
|
||||
{file = "debugpy-1.8.13-cp313-cp313-win32.whl", hash = "sha256:5268ae7fdca75f526d04465931cb0bd24577477ff50e8bb03dab90983f4ebd02"},
|
||||
{file = "debugpy-1.8.13-cp313-cp313-win_amd64.whl", hash = "sha256:79ce4ed40966c4c1631d0131606b055a5a2f8e430e3f7bf8fd3744b09943e8e8"},
|
||||
{file = "debugpy-1.8.13-cp38-cp38-macosx_14_0_x86_64.whl", hash = "sha256:acf39a6e98630959763f9669feddee540745dfc45ad28dbc9bd1f9cd60639391"},
|
||||
{file = "debugpy-1.8.13-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:924464d87e7d905eb0d79fb70846558910e906d9ee309b60c4fe597a2e802590"},
|
||||
{file = "debugpy-1.8.13-cp38-cp38-win32.whl", hash = "sha256:3dae443739c6b604802da9f3e09b0f45ddf1cf23c99161f3a1a8039f61a8bb89"},
|
||||
{file = "debugpy-1.8.13-cp38-cp38-win_amd64.whl", hash = "sha256:ed93c3155fc1f888ab2b43626182174e457fc31b7781cd1845629303790b8ad1"},
|
||||
{file = "debugpy-1.8.13-cp39-cp39-macosx_14_0_x86_64.whl", hash = "sha256:6fab771639332bd8ceb769aacf454a30d14d7a964f2012bf9c4e04c60f16e85b"},
|
||||
{file = "debugpy-1.8.13-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:32b6857f8263a969ce2ca098f228e5cc0604d277447ec05911a8c46cf3e7e307"},
|
||||
{file = "debugpy-1.8.13-cp39-cp39-win32.whl", hash = "sha256:f14d2c4efa1809da125ca62df41050d9c7cd9cb9e380a2685d1e453c4d450ccb"},
|
||||
{file = "debugpy-1.8.13-cp39-cp39-win_amd64.whl", hash = "sha256:ea869fe405880327497e6945c09365922c79d2a1eed4c3ae04d77ac7ae34b2b5"},
|
||||
{file = "debugpy-1.8.13-py2.py3-none-any.whl", hash = "sha256:d4ba115cdd0e3a70942bd562adba9ec8c651fe69ddde2298a1be296fc331906f"},
|
||||
{file = "debugpy-1.8.13.tar.gz", hash = "sha256:837e7bef95bdefba426ae38b9a94821ebdc5bea55627879cd48165c90b9e50ce"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -870,13 +870,13 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "docling-core"
|
||||
version = "2.22.0"
|
||||
version = "2.23.3"
|
||||
description = "A python library to define and validate data types in Docling."
|
||||
optional = false
|
||||
python-versions = "<4.0,>=3.9"
|
||||
files = [
|
||||
{file = "docling_core-2.22.0-py3-none-any.whl", hash = "sha256:d74d351024d016f46a09f171fb9d2d78809b132e18e25176af517ac4203c858c"},
|
||||
{file = "docling_core-2.22.0.tar.gz", hash = "sha256:5e4bf15884560a5dc66482206f875d152701bb809f0ed52bbbe86133e0d559e2"},
|
||||
{file = "docling_core-2.23.3-py3-none-any.whl", hash = "sha256:a2166ffc41f8fdf6fdb99b33da6c7146eccf6382712ea92e95772604fb5af5e5"},
|
||||
{file = "docling_core-2.23.3.tar.gz", hash = "sha256:a64ce41e0881c06962a2b3ec80e0665f84de0809dedf1bf84f3a14b75dd665c4"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -929,43 +929,43 @@ transformers = [
|
||||
|
||||
[[package]]
|
||||
name = "docling-parse"
|
||||
version = "3.4.0"
|
||||
version = "4.0.0"
|
||||
description = "Simple package to extract text with coordinates from programmatic PDFs"
|
||||
optional = false
|
||||
python-versions = "<4.0,>=3.9"
|
||||
files = [
|
||||
{file = "docling_parse-3.4.0-cp310-cp310-macosx_13_0_x86_64.whl", hash = "sha256:96e95e63ab722dfe5340fcb04d0e07bd1c0a0ba2f62e93c91ac26dda0a312a44"},
|
||||
{file = "docling_parse-3.4.0-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:f9e14a7a0b92526d4dfd3f390f3d7e075f59d14d6b8a0a564fbc26299e56cd47"},
|
||||
{file = "docling_parse-3.4.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fdef1d51291e841e5b6a32689a39a9f35986389f863b415eaa1790b29d021101"},
|
||||
{file = "docling_parse-3.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:68652610d6c34adc684dbaa77b5d596b25d004912a78e85ec4ae57910bf7086f"},
|
||||
{file = "docling_parse-3.4.0-cp310-cp310-win_amd64.whl", hash = "sha256:daad07fe93f306d8e2378acb24ef2fa68535ccdb960a1b99d6b36ab8c299fef1"},
|
||||
{file = "docling_parse-3.4.0-cp311-cp311-macosx_13_0_x86_64.whl", hash = "sha256:6f30c5fd3c04bd3d1a7d06baeae2e5c3adbebc284071a9a52b0150bcd4917a3d"},
|
||||
{file = "docling_parse-3.4.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:2c3664e4c8980dc44e0d026b1b01fbc94f0dac9adf7be835071d4a761977c36d"},
|
||||
{file = "docling_parse-3.4.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3febf7515453d18df03c275356db2bb5b0618ba9fc033aba05d58318a9846b1a"},
|
||||
{file = "docling_parse-3.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:75aeb038bb7f6400ecde99cf6c4ef35867c528ac21676071a822ed72d0653149"},
|
||||
{file = "docling_parse-3.4.0-cp311-cp311-win_amd64.whl", hash = "sha256:8d20e3584022542448c21ed0ac868b2457ae35211cea63ed20142e375549e633"},
|
||||
{file = "docling_parse-3.4.0-cp312-cp312-macosx_13_0_x86_64.whl", hash = "sha256:ddfe2bd730ed08363f25954a0480da021e6e6bdb175276643cc2913a6bbd98e2"},
|
||||
{file = "docling_parse-3.4.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:faf8ba9eaab8c17ea72516be5d440f754fcca27f37488dcf126a0f3ac3a63058"},
|
||||
{file = "docling_parse-3.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9eb5e7e50b3057690d0d4fa651363cafd7735bb952378dd8a4ca6c7d359507db"},
|
||||
{file = "docling_parse-3.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:452334b387e2c699f69acf37a4ea4ae7097d062a2dd1980c573b73051c031158"},
|
||||
{file = "docling_parse-3.4.0-cp312-cp312-win_amd64.whl", hash = "sha256:1ba00147ccb0a1dc10cdf58645e67f4ee895c6920bc583bc6f25d27cd562bfed"},
|
||||
{file = "docling_parse-3.4.0-cp313-cp313-macosx_13_0_x86_64.whl", hash = "sha256:2b22a33a2d2f3616a7ac0f4b2f2ba6099f8a5dc6fa328be0f17c9c506455d7c1"},
|
||||
{file = "docling_parse-3.4.0-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:0dd2440a94d555f98b702e88bfe7cc5a585d9191f4ea93884b02e286e7af3a06"},
|
||||
{file = "docling_parse-3.4.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5f5828744a0e33136e09e8c61ca0b2c0ead8f76595f2e0955beaac16adce51f5"},
|
||||
{file = "docling_parse-3.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:26fff6e36809d17ff855532f985df3738ada8d86a9fc746049ea6e6524d5e0a2"},
|
||||
{file = "docling_parse-3.4.0-cp313-cp313-win_amd64.whl", hash = "sha256:13fc442f64171280db98dc4507274ffa0a65bac94eecbcc60c3cbf41f433b556"},
|
||||
{file = "docling_parse-3.4.0-cp39-cp39-macosx_13_0_x86_64.whl", hash = "sha256:16d570ab655ea5a25d9cd1e27bc4d6905372784907d679cde4cef2fb22df61c7"},
|
||||
{file = "docling_parse-3.4.0-cp39-cp39-macosx_14_0_arm64.whl", hash = "sha256:05bd405635be2379ef6cb0c7c39dc08edf3ba93788eb0fca7426b2218538bce1"},
|
||||
{file = "docling_parse-3.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f6c92f0353bbae7ca9b39553cc4d03f5fefdab33ecd26809ab710cc752fac03c"},
|
||||
{file = "docling_parse-3.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8e883326ec4121891c48d365d064e5ae30c5b90a2dac44ed61ac02e7da41345d"},
|
||||
{file = "docling_parse-3.4.0-cp39-cp39-win_amd64.whl", hash = "sha256:b2a0fe1e1d88c3814553137daa597ee34dc310f50fe415e1f8a1c6e611d95e42"},
|
||||
{file = "docling_parse-3.4.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:930f5a5d78404de573c0ba302d313b6647f1e86714766e5a1cdc09af014ca111"},
|
||||
{file = "docling_parse-3.4.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:328fd72f274b939d454e3ff20a73074d99664cb4a51e6ccdaf195a6626691b95"},
|
||||
{file = "docling_parse-3.4.0.tar.gz", hash = "sha256:36cdd17bcc4a833b5c9af9ae3dc461ed18a975c1b084ccfd19a9d9cde4f66e14"},
|
||||
{file = "docling_parse-4.0.0-cp310-cp310-macosx_13_0_x86_64.whl", hash = "sha256:6de7fa8ec4919f604c9a02a3fa8ca0e13a3a8e3c0652adc41848616b737925d9"},
|
||||
{file = "docling_parse-4.0.0-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:82704280ab086a84a30d9ec9def6cd96b733aefc6973546b2101d09eed7a958e"},
|
||||
{file = "docling_parse-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f51ec645978d75e7cf232fa7c571ebf164a5bdf418588c663f9b3c062df6ba72"},
|
||||
{file = "docling_parse-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5d5da855f35303f9229198891da550e3c1e1f4025e52ab8c0303d345669ff46f"},
|
||||
{file = "docling_parse-4.0.0-cp310-cp310-win_amd64.whl", hash = "sha256:ba36cb329aadb306cc25901305d49fe6d2ed9e93e9dc993b4baf13fcc90a98e1"},
|
||||
{file = "docling_parse-4.0.0-cp311-cp311-macosx_13_0_x86_64.whl", hash = "sha256:9b7afbf09945b4d9e3ddb9c24a13d7b9f987cf32d5c9d68532ceb63fb26697df"},
|
||||
{file = "docling_parse-4.0.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:6daaec89c5045e968785a225b9b5a42b36dfe6b5a4437995e2d34e1595e2c162"},
|
||||
{file = "docling_parse-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:93e638ef2ad36e9e4a8ef881073696467e6699bf206e5a416de4abaaf531b0e1"},
|
||||
{file = "docling_parse-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:87246eb0d259202a7f093336f17235cb1fffb67e82b41dbc0e88f9c05b08014e"},
|
||||
{file = "docling_parse-4.0.0-cp311-cp311-win_amd64.whl", hash = "sha256:0ae44b913b010994c3e36869e5fc9dad252a7dc7434225790928075c8b5a7f6c"},
|
||||
{file = "docling_parse-4.0.0-cp312-cp312-macosx_13_0_x86_64.whl", hash = "sha256:ed6d8ac29c1014ed7a126d782b6bc963c9a9c09f41224fa90f9a8b45bf3191f9"},
|
||||
{file = "docling_parse-4.0.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:4a2dd46cee8e54f3aa511dbf552ef5f9f422944c54de73888ee55b2c4a6e10b9"},
|
||||
{file = "docling_parse-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:722fbd63f7f28e8a49fa2cd92d1571290f6c5295b86c7406b7c20a6c6e8b3975"},
|
||||
{file = "docling_parse-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cc155767b51a23f5bfd5abaabaf8c4a57777aa0277c813e13b9f6c43532964bd"},
|
||||
{file = "docling_parse-4.0.0-cp312-cp312-win_amd64.whl", hash = "sha256:e45ab31fffe4ae571bd2ecc9e0a9d5665a1486463396924160add84828d2a7e7"},
|
||||
{file = "docling_parse-4.0.0-cp313-cp313-macosx_13_0_x86_64.whl", hash = "sha256:d93fd3cec032e5b7f6385f7a021e228c52eb381f28fc037224708aeaad487d8b"},
|
||||
{file = "docling_parse-4.0.0-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:d9f64847cd7e9a7a34a3d5a14f0827022ed3b7f50f39d5126ef003c55d574ba3"},
|
||||
{file = "docling_parse-4.0.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a6ac283f08680dfde568b5629ab94830cab32795d74086553e755460b6879901"},
|
||||
{file = "docling_parse-4.0.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:97eca28220dc5075099e01f2cb7a3e9005b9951dee0ca0eb743e298be7284279"},
|
||||
{file = "docling_parse-4.0.0-cp313-cp313-win_amd64.whl", hash = "sha256:6019288cfe25a97993c2aab453386fc3e366d7761637e682b25915ba2c856cc4"},
|
||||
{file = "docling_parse-4.0.0-cp39-cp39-macosx_13_0_x86_64.whl", hash = "sha256:168c861233fc2a1e4b7d934aa6f7e1b3f568434fd478f18f0b3bcc09880d504c"},
|
||||
{file = "docling_parse-4.0.0-cp39-cp39-macosx_14_0_arm64.whl", hash = "sha256:b1cc0b7a214bc9e4e05c65572c4a17c19d0f4f0795fe1fa77a0ad499ab7e4e79"},
|
||||
{file = "docling_parse-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cec16060eba37db3fa2ff0b34d283cf33384caecc73b0d8dbf012e3b3941c21d"},
|
||||
{file = "docling_parse-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6a058c2330d7759d943ae50db9e4ecab60201a54116052f94e6e7a3886886b65"},
|
||||
{file = "docling_parse-4.0.0-cp39-cp39-win_amd64.whl", hash = "sha256:05af04972fef73f2e10cc46c8f541aaf6713fdcad254502a0012884109c1d468"},
|
||||
{file = "docling_parse-4.0.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:30c0c1b33c0a0aeb6897537f7d8fa09ed5a26f05685b18a2d27c73a789343679"},
|
||||
{file = "docling_parse-4.0.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:2dff48f5fef106539137a4e63ee58b5be0e7a81ac1aedd61a4453c268b8f76d1"},
|
||||
{file = "docling_parse-4.0.0.tar.gz", hash = "sha256:5be0ba4e0098524f116743e6b709f29fe273e441e61923c3a262e054643c5ee6"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
docling-core = ">=2.14.0,<3.0.0"
|
||||
docling-core = ">=2.23.0,<3.0.0"
|
||||
pillow = ">=10.0.0,<12.0.0"
|
||||
pydantic = ">=2.0.0,<3.0.0"
|
||||
pywin32 = {version = ">=305", markers = "sys_platform == \"win32\""}
|
||||
@ -1086,13 +1086,13 @@ devel = ["colorama", "json-spec", "jsonschema", "pylint", "pytest", "pytest-benc
|
||||
|
||||
[[package]]
|
||||
name = "filelock"
|
||||
version = "3.17.0"
|
||||
version = "3.18.0"
|
||||
description = "A platform independent file lock."
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "filelock-3.17.0-py3-none-any.whl", hash = "sha256:533dc2f7ba78dc2f0f531fc6c4940addf7b70a481e269a5a3b93be94ffbe8338"},
|
||||
{file = "filelock-3.17.0.tar.gz", hash = "sha256:ee4e77401ef576ebb38cd7f13b9b28893194acc20a8e68e18730ba9c0e54660e"},
|
||||
{file = "filelock-3.18.0-py3-none-any.whl", hash = "sha256:c401f4f8377c4464e6db25fff06205fd89bdd83b65eb0488ed1b160f780e21de"},
|
||||
{file = "filelock-3.18.0.tar.gz", hash = "sha256:adbc88eabb99d2fec8c9c1b229b171f18afa655400173ddc653d5d01501fb9f2"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
@ -1500,13 +1500,13 @@ zstd = ["zstandard (>=0.18.0)"]
|
||||
|
||||
[[package]]
|
||||
name = "huggingface-hub"
|
||||
version = "0.29.1"
|
||||
version = "0.29.3"
|
||||
description = "Client library to download and publish models, datasets and other repos on the huggingface.co hub"
|
||||
optional = false
|
||||
python-versions = ">=3.8.0"
|
||||
files = [
|
||||
{file = "huggingface_hub-0.29.1-py3-none-any.whl", hash = "sha256:352f69caf16566c7b6de84b54a822f6238e17ddd8ae3da4f8f2272aea5b198d5"},
|
||||
{file = "huggingface_hub-0.29.1.tar.gz", hash = "sha256:9524eae42077b8ff4fc459ceb7a514eca1c1232b775276b009709fe2a084f250"},
|
||||
{file = "huggingface_hub-0.29.3-py3-none-any.whl", hash = "sha256:0b25710932ac649c08cdbefa6c6ccb8e88eef82927cacdb048efb726429453aa"},
|
||||
{file = "huggingface_hub-0.29.3.tar.gz", hash = "sha256:64519a25716e0ba382ba2d3fb3ca082e7c7eb4a2fc634d200e8380006e0760e5"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -1548,13 +1548,13 @@ pyreadline3 = {version = "*", markers = "sys_platform == \"win32\" and python_ve
|
||||
|
||||
[[package]]
|
||||
name = "identify"
|
||||
version = "2.6.8"
|
||||
version = "2.6.9"
|
||||
description = "File identification library for Python"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "identify-2.6.8-py2.py3-none-any.whl", hash = "sha256:83657f0f766a3c8d0eaea16d4ef42494b39b34629a4b3192a9d020d349b3e255"},
|
||||
{file = "identify-2.6.8.tar.gz", hash = "sha256:61491417ea2c0c5c670484fd8abbb34de34cdae1e5f39a73ee65e48e4bb663fc"},
|
||||
{file = "identify-2.6.9-py2.py3-none-any.whl", hash = "sha256:c98b4322da415a8e5a70ff6e51fbc2d2932c015532d77e9f8537b4ba7813b150"},
|
||||
{file = "identify-2.6.9.tar.gz", hash = "sha256:d40dfe3142a1421d8518e3d3985ef5ac42890683e32306ad614a29490abeb6bf"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
@ -1851,13 +1851,13 @@ trio = ["trio"]
|
||||
|
||||
[[package]]
|
||||
name = "jinja2"
|
||||
version = "3.1.5"
|
||||
version = "3.1.6"
|
||||
description = "A very fast and expressive template engine."
|
||||
optional = false
|
||||
python-versions = ">=3.7"
|
||||
files = [
|
||||
{file = "jinja2-3.1.5-py3-none-any.whl", hash = "sha256:aba0f4dc9ed8013c424088f68a5c226f7d6097ed89b246d7749c2ec4175c6adb"},
|
||||
{file = "jinja2-3.1.5.tar.gz", hash = "sha256:8fefff8dc3034e27bb80d67c671eb8a9bc424c0ef4c0826edbff304cceff43bb"},
|
||||
{file = "jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67"},
|
||||
{file = "jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -2666,13 +2666,13 @@ min-versions = ["babel (==2.9.0)", "click (==7.0)", "colorama (==0.4)", "ghp-imp
|
||||
|
||||
[[package]]
|
||||
name = "mkdocs-autorefs"
|
||||
version = "1.4.0"
|
||||
version = "1.4.1"
|
||||
description = "Automatically link across pages in MkDocs."
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "mkdocs_autorefs-1.4.0-py3-none-any.whl", hash = "sha256:bad19f69655878d20194acd0162e29a89c3f7e6365ffe54e72aa3fd1072f240d"},
|
||||
{file = "mkdocs_autorefs-1.4.0.tar.gz", hash = "sha256:a9c0aa9c90edbce302c09d050a3c4cb7c76f8b7b2c98f84a7a05f53d00392156"},
|
||||
{file = "mkdocs_autorefs-1.4.1-py3-none-any.whl", hash = "sha256:9793c5ac06a6ebbe52ec0f8439256e66187badf4b5334b5fde0b128ec134df4f"},
|
||||
{file = "mkdocs_autorefs-1.4.1.tar.gz", hash = "sha256:4b5b6235a4becb2b10425c2fa191737e415b37aa3418919db33e5d774c9db079"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -2733,13 +2733,13 @@ pygments = ">2.12.0"
|
||||
|
||||
[[package]]
|
||||
name = "mkdocs-material"
|
||||
version = "9.6.7"
|
||||
version = "9.6.8"
|
||||
description = "Documentation that simply works"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
files = [
|
||||
{file = "mkdocs_material-9.6.7-py3-none-any.whl", hash = "sha256:8a159e45e80fcaadd9fbeef62cbf928569b93df954d4dc5ba76d46820caf7b47"},
|
||||
{file = "mkdocs_material-9.6.7.tar.gz", hash = "sha256:3e2c1fceb9410056c2d91f334a00cdea3215c28750e00c691c1e46b2a33309b4"},
|
||||
{file = "mkdocs_material-9.6.8-py3-none-any.whl", hash = "sha256:0a51532dd8aa80b232546c073fe3ef60dfaef1b1b12196ac7191ee01702d1cf8"},
|
||||
{file = "mkdocs_material-9.6.8.tar.gz", hash = "sha256:8de31bb7566379802532b248bd56d9c4bc834afc4625884bf5769f9412c6a354"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -3703,14 +3703,14 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "nvidia-nvjitlink-cu12"
|
||||
version = "12.8.61"
|
||||
version = "12.8.93"
|
||||
description = "Nvidia JIT LTO Library"
|
||||
optional = false
|
||||
python-versions = ">=3"
|
||||
files = [
|
||||
{file = "nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:45fd79f2ae20bd67e8bc411055939049873bfd8fac70ff13bd4865e0b9bdab17"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:9b80ecab31085dda3ce3b41d043be0ec739216c3fc633b8abe212d5a30026df0"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.8.61-py3-none-win_amd64.whl", hash = "sha256:1166a964d25fdc0eae497574d38824305195a5283324a21ccb0ce0c802cbf41c"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:81ff63371a7ebd6e6451970684f916be2eab07321b73c9d244dc2b4da7f73b88"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:adccd7161ace7261e01bb91e44e88da350895c270d23f744f0820c818b7229e7"},
|
||||
{file = "nvidia_nvjitlink_cu12-12.8.93-py3-none-win_amd64.whl", hash = "sha256:bd93fbeeee850917903583587f4fc3a4eafa022e34572251368238ab5e6bd67f"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -3796,32 +3796,29 @@ sympy = "*"
|
||||
|
||||
[[package]]
|
||||
name = "onnxruntime"
|
||||
version = "1.20.1"
|
||||
version = "1.21.0"
|
||||
description = "ONNX Runtime is a runtime accelerator for Machine Learning models"
|
||||
optional = true
|
||||
python-versions = "*"
|
||||
python-versions = ">=3.10"
|
||||
files = [
|
||||
{file = "onnxruntime-1.20.1-cp310-cp310-macosx_13_0_universal2.whl", hash = "sha256:e50ba5ff7fed4f7d9253a6baf801ca2883cc08491f9d32d78a80da57256a5439"},
|
||||
{file = "onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7b2908b50101a19e99c4d4e97ebb9905561daf61829403061c1adc1b588bc0de"},
|
||||
{file = "onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d82daaec24045a2e87598b8ac2b417b1cce623244e80e663882e9fe1aae86410"},
|
||||
{file = "onnxruntime-1.20.1-cp310-cp310-win32.whl", hash = "sha256:4c4b251a725a3b8cf2aab284f7d940c26094ecd9d442f07dd81ab5470e99b83f"},
|
||||
{file = "onnxruntime-1.20.1-cp310-cp310-win_amd64.whl", hash = "sha256:d3b616bb53a77a9463707bb313637223380fc327f5064c9a782e8ec69c22e6a2"},
|
||||
{file = "onnxruntime-1.20.1-cp311-cp311-macosx_13_0_universal2.whl", hash = "sha256:06bfbf02ca9ab5f28946e0f912a562a5f005301d0c419283dc57b3ed7969bb7b"},
|
||||
{file = "onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f6243e34d74423bdd1edf0ae9596dd61023b260f546ee17d701723915f06a9f7"},
|
||||
{file = "onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5eec64c0269dcdb8d9a9a53dc4d64f87b9e0c19801d9321246a53b7eb5a7d1bc"},
|
||||
{file = "onnxruntime-1.20.1-cp311-cp311-win32.whl", hash = "sha256:a19bc6e8c70e2485a1725b3d517a2319603acc14c1f1a017dda0afe6d4665b41"},
|
||||
{file = "onnxruntime-1.20.1-cp311-cp311-win_amd64.whl", hash = "sha256:8508887eb1c5f9537a4071768723ec7c30c28eb2518a00d0adcd32c89dea3221"},
|
||||
{file = "onnxruntime-1.20.1-cp312-cp312-macosx_13_0_universal2.whl", hash = "sha256:22b0655e2bf4f2161d52706e31f517a0e54939dc393e92577df51808a7edc8c9"},
|
||||
{file = "onnxruntime-1.20.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f1f56e898815963d6dc4ee1c35fc6c36506466eff6d16f3cb9848cea4e8c8172"},
|
||||
{file = "onnxruntime-1.20.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bb71a814f66517a65628c9e4a2bb530a6edd2cd5d87ffa0af0f6f773a027d99e"},
|
||||
{file = "onnxruntime-1.20.1-cp312-cp312-win32.whl", hash = "sha256:bd386cc9ee5f686ee8a75ba74037750aca55183085bf1941da8efcfe12d5b120"},
|
||||
{file = "onnxruntime-1.20.1-cp312-cp312-win_amd64.whl", hash = "sha256:19c2d843eb074f385e8bbb753a40df780511061a63f9def1b216bf53860223fb"},
|
||||
{file = "onnxruntime-1.20.1-cp313-cp313-macosx_13_0_universal2.whl", hash = "sha256:cc01437a32d0042b606f462245c8bbae269e5442797f6213e36ce61d5abdd8cc"},
|
||||
{file = "onnxruntime-1.20.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fb44b08e017a648924dbe91b82d89b0c105b1adcfe31e90d1dc06b8677ad37be"},
|
||||
{file = "onnxruntime-1.20.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bda6aebdf7917c1d811f21d41633df00c58aff2bef2f598f69289c1f1dabc4b3"},
|
||||
{file = "onnxruntime-1.20.1-cp313-cp313-win_amd64.whl", hash = "sha256:d30367df7e70f1d9fc5a6a68106f5961686d39b54d3221f760085524e8d38e16"},
|
||||
{file = "onnxruntime-1.20.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c9158465745423b2b5d97ed25aa7740c7d38d2993ee2e5c3bfacb0c4145c49d8"},
|
||||
{file = "onnxruntime-1.20.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0df6f2df83d61f46e842dbcde610ede27218947c33e994545a22333491e72a3b"},
|
||||
{file = "onnxruntime-1.21.0-cp310-cp310-macosx_13_0_universal2.whl", hash = "sha256:95513c9302bc8dd013d84148dcf3168e782a80cdbf1654eddc948a23147ccd3d"},
|
||||
{file = "onnxruntime-1.21.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:635d4ab13ae0f150dd4c6ff8206fd58f1c6600636ecc796f6f0c42e4c918585b"},
|
||||
{file = "onnxruntime-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7d06bfa0dd5512bd164f25a2bf594b2e7c9eabda6fc064b684924f3e81bdab1b"},
|
||||
{file = "onnxruntime-1.21.0-cp310-cp310-win_amd64.whl", hash = "sha256:b0fc22d219791e0284ee1d9c26724b8ee3fbdea28128ef25d9507ad3b9621f23"},
|
||||
{file = "onnxruntime-1.21.0-cp311-cp311-macosx_13_0_universal2.whl", hash = "sha256:8e16f8a79df03919810852fb46ffcc916dc87a9e9c6540a58f20c914c575678c"},
|
||||
{file = "onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7f9156cf6f8ee133d07a751e6518cf6f84ed37fbf8243156bd4a2c4ee6e073c8"},
|
||||
{file = "onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8a5d09815a9e209fa0cb20c2985b34ab4daeba7aea94d0f96b8751eb10403201"},
|
||||
{file = "onnxruntime-1.21.0-cp311-cp311-win_amd64.whl", hash = "sha256:1d970dff1e2fa4d9c53f2787b3b7d0005596866e6a31997b41169017d1362dd0"},
|
||||
{file = "onnxruntime-1.21.0-cp312-cp312-macosx_13_0_universal2.whl", hash = "sha256:893d67c68ca9e7a58202fa8d96061ed86a5815b0925b5a97aef27b8ba246a20b"},
|
||||
{file = "onnxruntime-1.21.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:37b7445c920a96271a8dfa16855e258dc5599235b41c7bbde0d262d55bcc105f"},
|
||||
{file = "onnxruntime-1.21.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9a04aafb802c1e5573ba4552f8babcb5021b041eb4cfa802c9b7644ca3510eca"},
|
||||
{file = "onnxruntime-1.21.0-cp312-cp312-win_amd64.whl", hash = "sha256:7f801318476cd7003d636a5b392f7a37c08b6c8d2f829773f3c3887029e03f32"},
|
||||
{file = "onnxruntime-1.21.0-cp313-cp313-macosx_13_0_universal2.whl", hash = "sha256:85718cbde1c2912d3a03e3b3dc181b1480258a229c32378408cace7c450f7f23"},
|
||||
{file = "onnxruntime-1.21.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:94dff3a61538f3b7b0ea9a06bc99e1410e90509c76e3a746f039e417802a12ae"},
|
||||
{file = "onnxruntime-1.21.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c1e704b0eda5f2bbbe84182437315eaec89a450b08854b5a7762c85d04a28a0a"},
|
||||
{file = "onnxruntime-1.21.0-cp313-cp313-win_amd64.whl", hash = "sha256:19b630c6a8956ef97fb7c94948b17691167aa1aaf07b5f214fa66c3e4136c108"},
|
||||
{file = "onnxruntime-1.21.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3995c4a2d81719623c58697b9510f8de9fa42a1da6b4474052797b0d712324fe"},
|
||||
{file = "onnxruntime-1.21.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:36b18b8f39c0f84e783902112a0dd3c102466897f96d73bb83f6a6bff283a423"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -4438,22 +4435,20 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "protobuf"
|
||||
version = "5.29.3"
|
||||
version = "6.30.1"
|
||||
description = ""
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "protobuf-5.29.3-cp310-abi3-win32.whl", hash = "sha256:3ea51771449e1035f26069c4c7fd51fba990d07bc55ba80701c78f886bf9c888"},
|
||||
{file = "protobuf-5.29.3-cp310-abi3-win_amd64.whl", hash = "sha256:a4fa6f80816a9a0678429e84973f2f98cbc218cca434abe8db2ad0bffc98503a"},
|
||||
{file = "protobuf-5.29.3-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:a8434404bbf139aa9e1300dbf989667a83d42ddda9153d8ab76e0d5dcaca484e"},
|
||||
{file = "protobuf-5.29.3-cp38-abi3-manylinux2014_aarch64.whl", hash = "sha256:daaf63f70f25e8689c072cfad4334ca0ac1d1e05a92fc15c54eb9cf23c3efd84"},
|
||||
{file = "protobuf-5.29.3-cp38-abi3-manylinux2014_x86_64.whl", hash = "sha256:c027e08a08be10b67c06bf2370b99c811c466398c357e615ca88c91c07f0910f"},
|
||||
{file = "protobuf-5.29.3-cp38-cp38-win32.whl", hash = "sha256:84a57163a0ccef3f96e4b6a20516cedcf5bb3a95a657131c5c3ac62200d23252"},
|
||||
{file = "protobuf-5.29.3-cp38-cp38-win_amd64.whl", hash = "sha256:b89c115d877892a512f79a8114564fb435943b59067615894c3b13cd3e1fa107"},
|
||||
{file = "protobuf-5.29.3-cp39-cp39-win32.whl", hash = "sha256:0eb32bfa5219fc8d4111803e9a690658aa2e6366384fd0851064b963b6d1f2a7"},
|
||||
{file = "protobuf-5.29.3-cp39-cp39-win_amd64.whl", hash = "sha256:6ce8cc3389a20693bfde6c6562e03474c40851b44975c9b2bf6df7d8c4f864da"},
|
||||
{file = "protobuf-5.29.3-py3-none-any.whl", hash = "sha256:0a18ed4a24198528f2333802eb075e59dea9d679ab7a6c5efb017a59004d849f"},
|
||||
{file = "protobuf-5.29.3.tar.gz", hash = "sha256:5da0f41edaf117bde316404bad1a486cb4ededf8e4a54891296f648e8e076620"},
|
||||
{file = "protobuf-6.30.1-cp310-abi3-win32.whl", hash = "sha256:ba0706f948d0195f5cac504da156d88174e03218d9364ab40d903788c1903d7e"},
|
||||
{file = "protobuf-6.30.1-cp310-abi3-win_amd64.whl", hash = "sha256:ed484f9ddd47f0f1bf0648806cccdb4fe2fb6b19820f9b79a5adf5dcfd1b8c5f"},
|
||||
{file = "protobuf-6.30.1-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:aa4f7dfaed0d840b03d08d14bfdb41348feaee06a828a8c455698234135b4075"},
|
||||
{file = "protobuf-6.30.1-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:47cd320b7db63e8c9ac35f5596ea1c1e61491d8a8eb6d8b45edc44760b53a4f6"},
|
||||
{file = "protobuf-6.30.1-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:e3083660225fa94748ac2e407f09a899e6a28bf9c0e70c75def8d15706bf85fc"},
|
||||
{file = "protobuf-6.30.1-cp39-cp39-win32.whl", hash = "sha256:554d7e61cce2aa4c63ca27328f757a9f3867bce8ec213bf09096a8d16bcdcb6a"},
|
||||
{file = "protobuf-6.30.1-cp39-cp39-win_amd64.whl", hash = "sha256:b510f55ce60f84dc7febc619b47215b900466e3555ab8cb1ba42deb4496d6cc0"},
|
||||
{file = "protobuf-6.30.1-py3-none-any.whl", hash = "sha256:3c25e51e1359f1f5fa3b298faa6016e650d148f214db2e47671131b9063c53be"},
|
||||
{file = "protobuf-6.30.1.tar.gz", hash = "sha256:535fb4e44d0236893d5cf1263a0f706f1160b689a7ab962e9da8a9ce4050b780"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -4875,13 +4870,13 @@ extra = ["pygments (>=2.19.1)"]
|
||||
|
||||
[[package]]
|
||||
name = "pymilvus"
|
||||
version = "2.5.4"
|
||||
version = "2.5.5"
|
||||
description = "Python Sdk for Milvus"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
files = [
|
||||
{file = "pymilvus-2.5.4-py3-none-any.whl", hash = "sha256:3f7ddaeae0c8f63554b8e316b73f265d022e05a457d47c366ce47293434a3aea"},
|
||||
{file = "pymilvus-2.5.4.tar.gz", hash = "sha256:611732428ff669d57ded3d1f823bdeb10febf233d0251cce8498b287e5a10ce8"},
|
||||
{file = "pymilvus-2.5.5-py3-none-any.whl", hash = "sha256:b91794fbaf72c6d7ed2419b8d4e67369263bdc16f1722f02c97927cfdf3e69da"},
|
||||
{file = "pymilvus-2.5.5.tar.gz", hash = "sha256:8985f018961853022e03639a9ff323d5c22d0b659e66e288f4d08de11789e1d4"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -4896,7 +4891,7 @@ ujson = ">=2.0.0"
|
||||
[package.extras]
|
||||
bulk-writer = ["azure-storage-blob", "minio (>=7.0.0)", "pyarrow (>=12.0.0)", "requests"]
|
||||
dev = ["black", "grpcio (==1.62.2)", "grpcio-testing (==1.62.2)", "grpcio-tools (==1.62.2)", "pytest (>=5.3.4)", "pytest-cov (>=2.8.1)", "pytest-timeout (>=1.3.4)", "ruff (>0.4.0)"]
|
||||
model = ["milvus-model (>=0.1.0)"]
|
||||
model = ["pymilvus.model (>=0.3.0)"]
|
||||
|
||||
[[package]]
|
||||
name = "pyobjc-core"
|
||||
@ -5332,29 +5327,27 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "pywin32"
|
||||
version = "308"
|
||||
version = "309"
|
||||
description = "Python for Window Extensions"
|
||||
optional = false
|
||||
python-versions = "*"
|
||||
files = [
|
||||
{file = "pywin32-308-cp310-cp310-win32.whl", hash = "sha256:796ff4426437896550d2981b9c2ac0ffd75238ad9ea2d3bfa67a1abd546d262e"},
|
||||
{file = "pywin32-308-cp310-cp310-win_amd64.whl", hash = "sha256:4fc888c59b3c0bef905ce7eb7e2106a07712015ea1c8234b703a088d46110e8e"},
|
||||
{file = "pywin32-308-cp310-cp310-win_arm64.whl", hash = "sha256:a5ab5381813b40f264fa3495b98af850098f814a25a63589a8e9eb12560f450c"},
|
||||
{file = "pywin32-308-cp311-cp311-win32.whl", hash = "sha256:5d8c8015b24a7d6855b1550d8e660d8daa09983c80e5daf89a273e5c6fb5095a"},
|
||||
{file = "pywin32-308-cp311-cp311-win_amd64.whl", hash = "sha256:575621b90f0dc2695fec346b2d6302faebd4f0f45c05ea29404cefe35d89442b"},
|
||||
{file = "pywin32-308-cp311-cp311-win_arm64.whl", hash = "sha256:100a5442b7332070983c4cd03f2e906a5648a5104b8a7f50175f7906efd16bb6"},
|
||||
{file = "pywin32-308-cp312-cp312-win32.whl", hash = "sha256:587f3e19696f4bf96fde9d8a57cec74a57021ad5f204c9e627e15c33ff568897"},
|
||||
{file = "pywin32-308-cp312-cp312-win_amd64.whl", hash = "sha256:00b3e11ef09ede56c6a43c71f2d31857cf7c54b0ab6e78ac659497abd2834f47"},
|
||||
{file = "pywin32-308-cp312-cp312-win_arm64.whl", hash = "sha256:9b4de86c8d909aed15b7011182c8cab38c8850de36e6afb1f0db22b8959e3091"},
|
||||
{file = "pywin32-308-cp313-cp313-win32.whl", hash = "sha256:1c44539a37a5b7b21d02ab34e6a4d314e0788f1690d65b48e9b0b89f31abbbed"},
|
||||
{file = "pywin32-308-cp313-cp313-win_amd64.whl", hash = "sha256:fd380990e792eaf6827fcb7e187b2b4b1cede0585e3d0c9e84201ec27b9905e4"},
|
||||
{file = "pywin32-308-cp313-cp313-win_arm64.whl", hash = "sha256:ef313c46d4c18dfb82a2431e3051ac8f112ccee1a34f29c263c583c568db63cd"},
|
||||
{file = "pywin32-308-cp37-cp37m-win32.whl", hash = "sha256:1f696ab352a2ddd63bd07430080dd598e6369152ea13a25ebcdd2f503a38f1ff"},
|
||||
{file = "pywin32-308-cp37-cp37m-win_amd64.whl", hash = "sha256:13dcb914ed4347019fbec6697a01a0aec61019c1046c2b905410d197856326a6"},
|
||||
{file = "pywin32-308-cp38-cp38-win32.whl", hash = "sha256:5794e764ebcabf4ff08c555b31bd348c9025929371763b2183172ff4708152f0"},
|
||||
{file = "pywin32-308-cp38-cp38-win_amd64.whl", hash = "sha256:3b92622e29d651c6b783e368ba7d6722b1634b8e70bd376fd7610fe1992e19de"},
|
||||
{file = "pywin32-308-cp39-cp39-win32.whl", hash = "sha256:7873ca4dc60ab3287919881a7d4f88baee4a6e639aa6962de25a98ba6b193341"},
|
||||
{file = "pywin32-308-cp39-cp39-win_amd64.whl", hash = "sha256:71b3322d949b4cc20776436a9c9ba0eeedcbc9c650daa536df63f0ff111bb920"},
|
||||
{file = "pywin32-309-cp310-cp310-win32.whl", hash = "sha256:5b78d98550ca093a6fe7ab6d71733fbc886e2af9d4876d935e7f6e1cd6577ac9"},
|
||||
{file = "pywin32-309-cp310-cp310-win_amd64.whl", hash = "sha256:728d08046f3d65b90d4c77f71b6fbb551699e2005cc31bbffd1febd6a08aa698"},
|
||||
{file = "pywin32-309-cp310-cp310-win_arm64.whl", hash = "sha256:c667bcc0a1e6acaca8984eb3e2b6e42696fc035015f99ff8bc6c3db4c09a466a"},
|
||||
{file = "pywin32-309-cp311-cp311-win32.whl", hash = "sha256:d5df6faa32b868baf9ade7c9b25337fa5eced28eb1ab89082c8dae9c48e4cd51"},
|
||||
{file = "pywin32-309-cp311-cp311-win_amd64.whl", hash = "sha256:e7ec2cef6df0926f8a89fd64959eba591a1eeaf0258082065f7bdbe2121228db"},
|
||||
{file = "pywin32-309-cp311-cp311-win_arm64.whl", hash = "sha256:54ee296f6d11db1627216e9b4d4c3231856ed2d9f194c82f26c6cb5650163f4c"},
|
||||
{file = "pywin32-309-cp312-cp312-win32.whl", hash = "sha256:de9acacced5fa82f557298b1fed5fef7bd49beee04190f68e1e4783fbdc19926"},
|
||||
{file = "pywin32-309-cp312-cp312-win_amd64.whl", hash = "sha256:6ff9eebb77ffc3d59812c68db33c0a7817e1337e3537859499bd27586330fc9e"},
|
||||
{file = "pywin32-309-cp312-cp312-win_arm64.whl", hash = "sha256:619f3e0a327b5418d833f44dc87859523635cf339f86071cc65a13c07be3110f"},
|
||||
{file = "pywin32-309-cp313-cp313-win32.whl", hash = "sha256:008bffd4afd6de8ca46c6486085414cc898263a21a63c7f860d54c9d02b45c8d"},
|
||||
{file = "pywin32-309-cp313-cp313-win_amd64.whl", hash = "sha256:bd0724f58492db4cbfbeb1fcd606495205aa119370c0ddc4f70e5771a3ab768d"},
|
||||
{file = "pywin32-309-cp313-cp313-win_arm64.whl", hash = "sha256:8fd9669cfd41863b688a1bc9b1d4d2d76fd4ba2128be50a70b0ea66b8d37953b"},
|
||||
{file = "pywin32-309-cp38-cp38-win32.whl", hash = "sha256:617b837dc5d9dfa7e156dbfa7d3906c009a2881849a80a9ae7519f3dd8c6cb86"},
|
||||
{file = "pywin32-309-cp38-cp38-win_amd64.whl", hash = "sha256:0be3071f555480fbfd86a816a1a773880ee655bf186aa2931860dbb44e8424f8"},
|
||||
{file = "pywin32-309-cp39-cp39-win32.whl", hash = "sha256:72ae9ae3a7a6473223589a1621f9001fe802d59ed227fd6a8503c9af67c1d5f4"},
|
||||
{file = "pywin32-309-cp39-cp39-win_amd64.whl", hash = "sha256:88bc06d6a9feac70783de64089324568ecbc65866e2ab318eab35da3811fd7ef"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -5446,120 +5439,104 @@ pyyaml = "*"
|
||||
|
||||
[[package]]
|
||||
name = "pyzmq"
|
||||
version = "26.2.1"
|
||||
version = "26.3.0"
|
||||
description = "Python bindings for 0MQ"
|
||||
optional = false
|
||||
python-versions = ">=3.7"
|
||||
python-versions = ">=3.8"
|
||||
files = [
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:f39d1227e8256d19899d953e6e19ed2ccb689102e6d85e024da5acf410f301eb"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:a23948554c692df95daed595fdd3b76b420a4939d7a8a28d6d7dea9711878641"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:95f5728b367a042df146cec4340d75359ec6237beebf4a8f5cf74657c65b9257"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:95f7b01b3f275504011cf4cf21c6b885c8d627ce0867a7e83af1382ebab7b3ff"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80a00370a2ef2159c310e662c7c0f2d030f437f35f478bb8b2f70abd07e26b24"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:8531ed35dfd1dd2af95f5d02afd6545e8650eedbf8c3d244a554cf47d8924459"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:cdb69710e462a38e6039cf17259d328f86383a06c20482cc154327968712273c"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:e7eeaef81530d0b74ad0d29eec9997f1c9230c2f27242b8d17e0ee67662c8f6e"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:361edfa350e3be1f987e592e834594422338d7174364763b7d3de5b0995b16f3"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-win32.whl", hash = "sha256:637536c07d2fb6a354988b2dd1d00d02eb5dd443f4bbee021ba30881af1c28aa"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-win_amd64.whl", hash = "sha256:45fad32448fd214fbe60030aa92f97e64a7140b624290834cc9b27b3a11f9473"},
|
||||
{file = "pyzmq-26.2.1-cp310-cp310-win_arm64.whl", hash = "sha256:d9da0289d8201c8a29fd158aaa0dfe2f2e14a181fd45e2dc1fbf969a62c1d594"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:c059883840e634a21c5b31d9b9a0e2b48f991b94d60a811092bc37992715146a"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:ed038a921df836d2f538e509a59cb638df3e70ca0fcd70d0bf389dfcdf784d2a"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9027a7fcf690f1a3635dc9e55e38a0d6602dbbc0548935d08d46d2e7ec91f454"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6d75fcb00a1537f8b0c0bb05322bc7e35966148ffc3e0362f0369e44a4a1de99"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f0019cc804ac667fb8c8eaecdb66e6d4a68acf2e155d5c7d6381a5645bd93ae4"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:f19dae58b616ac56b96f2e2290f2d18730a898a171f447f491cc059b073ca1fa"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:f5eeeb82feec1fc5cbafa5ee9022e87ffdb3a8c48afa035b356fcd20fc7f533f"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:000760e374d6f9d1a3478a42ed0c98604de68c9e94507e5452951e598ebecfba"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:817fcd3344d2a0b28622722b98500ae9c8bfee0f825b8450932ff19c0b15bebd"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-win32.whl", hash = "sha256:88812b3b257f80444a986b3596e5ea5c4d4ed4276d2b85c153a6fbc5ca457ae7"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-win_amd64.whl", hash = "sha256:ef29630fde6022471d287c15c0a2484aba188adbfb978702624ba7a54ddfa6c1"},
|
||||
{file = "pyzmq-26.2.1-cp311-cp311-win_arm64.whl", hash = "sha256:f32718ee37c07932cc336096dc7403525301fd626349b6eff8470fe0f996d8d7"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-macosx_10_15_universal2.whl", hash = "sha256:a6549ecb0041dafa55b5932dcbb6c68293e0bd5980b5b99f5ebb05f9a3b8a8f3"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:0250c94561f388db51fd0213cdccbd0b9ef50fd3c57ce1ac937bf3034d92d72e"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:36ee4297d9e4b34b5dc1dd7ab5d5ea2cbba8511517ef44104d2915a917a56dc8"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c2a9cb17fd83b7a3a3009901aca828feaf20aa2451a8a487b035455a86549c09"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:786dd8a81b969c2081b31b17b326d3a499ddd1856e06d6d79ad41011a25148da"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:2d88ba221a07fc2c5581565f1d0fe8038c15711ae79b80d9462e080a1ac30435"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:1c84c1297ff9f1cd2440da4d57237cb74be21fdfe7d01a10810acba04e79371a"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:46d4ebafc27081a7f73a0f151d0c38d4291656aa134344ec1f3d0199ebfbb6d4"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:91e2bfb8e9a29f709d51b208dd5f441dc98eb412c8fe75c24ea464734ccdb48e"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-win32.whl", hash = "sha256:4a98898fdce380c51cc3e38ebc9aa33ae1e078193f4dc641c047f88b8c690c9a"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-win_amd64.whl", hash = "sha256:a0741edbd0adfe5f30bba6c5223b78c131b5aa4a00a223d631e5ef36e26e6d13"},
|
||||
{file = "pyzmq-26.2.1-cp312-cp312-win_arm64.whl", hash = "sha256:e5e33b1491555843ba98d5209439500556ef55b6ab635f3a01148545498355e5"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:099b56ef464bc355b14381f13355542e452619abb4c1e57a534b15a106bf8e23"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-macosx_10_15_universal2.whl", hash = "sha256:651726f37fcbce9f8dd2a6dab0f024807929780621890a4dc0c75432636871be"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:57dd4d91b38fa4348e237a9388b4423b24ce9c1695bbd4ba5a3eada491e09399"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d51a7bfe01a48e1064131f3416a5439872c533d756396be2b39e3977b41430f9"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c7154d228502e18f30f150b7ce94f0789d6b689f75261b623f0fdc1eec642aab"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:f1f31661a80cc46aba381bed475a9135b213ba23ca7ff6797251af31510920ce"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:290c96f479504439b6129a94cefd67a174b68ace8a8e3f551b2239a64cfa131a"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:f2c307fbe86e18ab3c885b7e01de942145f539165c3360e2af0f094dd440acd9"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:b314268e716487bfb86fcd6f84ebbe3e5bec5fac75fdf42bc7d90fdb33f618ad"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-win32.whl", hash = "sha256:edb550616f567cd5603b53bb52a5f842c0171b78852e6fc7e392b02c2a1504bb"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-win_amd64.whl", hash = "sha256:100a826a029c8ef3d77a1d4c97cbd6e867057b5806a7276f2bac1179f893d3bf"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313-win_arm64.whl", hash = "sha256:6991ee6c43e0480deb1b45d0c7c2bac124a6540cba7db4c36345e8e092da47ce"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:25e720dba5b3a3bb2ad0ad5d33440babd1b03438a7a5220511d0c8fa677e102e"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-macosx_10_15_universal2.whl", hash = "sha256:9ec6abfb701437142ce9544bd6a236addaf803a32628d2260eb3dbd9a60e2891"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2e1eb9d2bfdf5b4e21165b553a81b2c3bd5be06eeddcc4e08e9692156d21f1f6"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:90dc731d8e3e91bcd456aa7407d2eba7ac6f7860e89f3766baabb521f2c1de4a"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0b6a93d684278ad865fc0b9e89fe33f6ea72d36da0e842143891278ff7fd89c3"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:c1bb37849e2294d519117dd99b613c5177934e5c04a5bb05dd573fa42026567e"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:632a09c6d8af17b678d84df442e9c3ad8e4949c109e48a72f805b22506c4afa7"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-musllinux_1_1_i686.whl", hash = "sha256:fc409c18884eaf9ddde516d53af4f2db64a8bc7d81b1a0c274b8aa4e929958e8"},
|
||||
{file = "pyzmq-26.2.1-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:17f88622b848805d3f6427ce1ad5a2aa3cf61f12a97e684dab2979802024d460"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:3ef584f13820d2629326fe20cc04069c21c5557d84c26e277cfa6235e523b10f"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:160194d1034902937359c26ccfa4e276abffc94937e73add99d9471e9f555dd6"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:574b285150afdbf0a0424dddf7ef9a0d183988eb8d22feacb7160f7515e032cb"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:44dba28c34ce527cf687156c81f82bf1e51f047838d5964f6840fd87dfecf9fe"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:9fbdb90b85c7624c304f72ec7854659a3bd901e1c0ffb2363163779181edeb68"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:a7ad34a2921e8f76716dc7205c9bf46a53817e22b9eec2e8a3e08ee4f4a72468"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:866c12b7c90dd3a86983df7855c6f12f9407c8684db6aa3890fc8027462bda82"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-win32.whl", hash = "sha256:eeb37f65350d5c5870517f02f8bbb2ac0fbec7b416c0f4875219fef305a89a45"},
|
||||
{file = "pyzmq-26.2.1-cp37-cp37m-win_amd64.whl", hash = "sha256:4eb3197f694dfb0ee6af29ef14a35f30ae94ff67c02076eef8125e2d98963cd0"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-macosx_10_15_universal2.whl", hash = "sha256:36d4e7307db7c847fe37413f333027d31c11d5e6b3bacbb5022661ac635942ba"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:1c6ae0e95d0a4b0cfe30f648a18e764352d5415279bdf34424decb33e79935b8"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:5b4fc44f5360784cc02392f14235049665caaf7c0fe0b04d313e763d3338e463"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:51431f6b2750eb9b9d2b2952d3cc9b15d0215e1b8f37b7a3239744d9b487325d"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:bdbc78ae2065042de48a65f1421b8af6b76a0386bb487b41955818c3c1ce7bed"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:d14f50d61a89b0925e4d97a0beba6053eb98c426c5815d949a43544f05a0c7ec"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:004837cb958988c75d8042f5dac19a881f3d9b3b75b2f574055e22573745f841"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:0b2007f28ce1b8acebdf4812c1aab997a22e57d6a73b5f318b708ef9bcabbe95"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-win32.whl", hash = "sha256:269c14904da971cb5f013100d1aaedb27c0a246728c341d5d61ddd03f463f2f3"},
|
||||
{file = "pyzmq-26.2.1-cp38-cp38-win_amd64.whl", hash = "sha256:31fff709fef3b991cfe7189d2cfe0c413a1d0e82800a182cfa0c2e3668cd450f"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-macosx_10_15_universal2.whl", hash = "sha256:a4bffcadfd40660f26d1b3315a6029fd4f8f5bf31a74160b151f5c577b2dc81b"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:e76ad4729c2f1cf74b6eb1bdd05f6aba6175999340bd51e6caee49a435a13bf5"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:8b0f5bab40a16e708e78a0c6ee2425d27e1a5d8135c7a203b4e977cee37eb4aa"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:e8e47050412f0ad3a9b2287779758073cbf10e460d9f345002d4779e43bb0136"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7f18ce33f422d119b13c1363ed4cce245b342b2c5cbbb76753eabf6aa6f69c7d"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:ceb0d78b7ef106708a7e2c2914afe68efffc0051dc6a731b0dbacd8b4aee6d68"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:7ebdd96bd637fd426d60e86a29ec14b8c1ab64b8d972f6a020baf08a30d1cf46"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:03719e424150c6395b9513f53a5faadcc1ce4b92abdf68987f55900462ac7eec"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-win32.whl", hash = "sha256:ef5479fac31df4b304e96400fc67ff08231873ee3537544aa08c30f9d22fce38"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-win_amd64.whl", hash = "sha256:f92a002462154c176dac63a8f1f6582ab56eb394ef4914d65a9417f5d9fde218"},
|
||||
{file = "pyzmq-26.2.1-cp39-cp39-win_arm64.whl", hash = "sha256:1fd4b3efc6f62199886440d5e27dd3ccbcb98dfddf330e7396f1ff421bfbb3c2"},
|
||||
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:380816d298aed32b1a97b4973a4865ef3be402a2e760204509b52b6de79d755d"},
|
||||
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:97cbb368fd0debdbeb6ba5966aa28e9a1ae3396c7386d15569a6ca4be4572b99"},
|
||||
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:abf7b5942c6b0dafcc2823ddd9154f419147e24f8df5b41ca8ea40a6db90615c"},
|
||||
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3fe6e28a8856aea808715f7a4fc11f682b9d29cac5d6262dd8fe4f98edc12d53"},
|
||||
{file = "pyzmq-26.2.1-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:bd8fdee945b877aa3bffc6a5a8816deb048dab0544f9df3731ecd0e54d8c84c9"},
|
||||
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl", hash = "sha256:ee7152f32c88e0e1b5b17beb9f0e2b14454235795ef68c0c120b6d3d23d12833"},
|
||||
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:baa1da72aecf6a490b51fba7a51f1ce298a1e0e86d0daef8265c8f8f9848eb77"},
|
||||
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:49135bb327fca159262d8fd14aa1f4a919fe071b04ed08db4c7c37d2f0647162"},
|
||||
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8bacc1a10c150d58e8a9ee2b2037a70f8d903107e0f0b6e079bf494f2d09c091"},
|
||||
{file = "pyzmq-26.2.1-pp37-pypy37_pp73-win_amd64.whl", hash = "sha256:09dac387ce62d69bec3f06d51610ca1d660e7849eb45f68e38e7f5cf1f49cbcb"},
|
||||
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl", hash = "sha256:70b3a46ecd9296e725ccafc17d732bfc3cdab850b54bd913f843a0a54dfb2c04"},
|
||||
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:59660e15c797a3b7a571c39f8e0b62a1f385f98ae277dfe95ca7eaf05b5a0f12"},
|
||||
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:0f50db737d688e96ad2a083ad2b453e22865e7e19c7f17d17df416e91ddf67eb"},
|
||||
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a003200b6cd64e89b5725ff7e284a93ab24fd54bbac8b4fa46b1ed57be693c27"},
|
||||
{file = "pyzmq-26.2.1-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:f9ba5def063243793dec6603ad1392f735255cbc7202a3a484c14f99ec290705"},
|
||||
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:1238c2448c58b9c8d6565579393148414a42488a5f916b3f322742e561f6ae0d"},
|
||||
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8eddb3784aed95d07065bcf94d07e8c04024fdb6b2386f08c197dfe6b3528fda"},
|
||||
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f0f19c2097fffb1d5b07893d75c9ee693e9cbc809235cf3f2267f0ef6b015f24"},
|
||||
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0995fd3530f2e89d6b69a2202e340bbada3191014352af978fa795cb7a446331"},
|
||||
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:7c6160fe513654e65665332740f63de29ce0d165e053c0c14a161fa60dd0da01"},
|
||||
{file = "pyzmq-26.2.1-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:8ec8e3aea6146b761d6c57fcf8f81fcb19f187afecc19bf1701a48db9617a217"},
|
||||
{file = "pyzmq-26.2.1.tar.gz", hash = "sha256:17d72a74e5e9ff3829deb72897a175333d3ef5b5413948cae3cf7ebf0b02ecca"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:1586944f4736515af5c6d3a5b150c7e8ca2a2d6e46b23057320584d6f2438f4a"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa7efc695d1fc9f72d91bf9b6c6fe2d7e1b4193836ec530a98faf7d7a7577a58"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bd84441e4021cec6e4dd040550386cd9c9ea1d9418ea1a8002dbb7b576026b2b"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9176856f36c34a8aa5c0b35ddf52a5d5cd8abeece57c2cd904cfddae3fd9acd3"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:49334faa749d55b77f084389a80654bf2e68ab5191c0235066f0140c1b670d64"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:fd30fc80fe96efb06bea21667c5793bbd65c0dc793187feb39b8f96990680b00"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:b2eddfbbfb473a62c3a251bb737a6d58d91907f6e1d95791431ebe556f47d916"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:70b3acb9ad729a53d4e751dace35404a024f188aad406013454216aba5485b4e"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-win32.whl", hash = "sha256:c1bd75d692cd7c6d862a98013bfdf06702783b75cffbf5dae06d718fecefe8f2"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-win_amd64.whl", hash = "sha256:d7165bcda0dbf203e5ad04d79955d223d84b2263df4db92f525ba370b03a12ab"},
|
||||
{file = "pyzmq-26.3.0-cp310-cp310-win_arm64.whl", hash = "sha256:e34a63f71d2ecffb3c643909ad2d488251afeb5ef3635602b3448e609611a7ed"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:2833602d9d42c94b9d0d2a44d2b382d3d3a4485be018ba19dddc401a464c617a"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d8270d104ec7caa0bdac246d31d48d94472033ceab5ba142881704350b28159c"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c208a977843d18d3bd185f323e4eaa912eb4869cb230947dc6edd8a27a4e558a"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eddc2be28a379c218e0d92e4a432805dcb0ca5870156a90b54c03cd9799f9f8a"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:c0b519fa2159c42272f8a244354a0e110d65175647e5185b04008ec00df9f079"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:1595533de3a80bf8363372c20bafa963ec4bf9f2b8f539b1d9a5017f430b84c9"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:bbef99eb8d18ba9a40f00e8836b8040cdcf0f2fa649684cf7a66339599919d21"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:979486d444ca3c469cd1c7f6a619ce48ff08b3b595d451937db543754bfacb65"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-win32.whl", hash = "sha256:4b127cfe10b4c56e4285b69fd4b38ea1d368099ea4273d8fb349163fce3cd598"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-win_amd64.whl", hash = "sha256:cf736cc1298ef15280d9fcf7a25c09b05af016656856dc6fe5626fd8912658dd"},
|
||||
{file = "pyzmq-26.3.0-cp311-cp311-win_arm64.whl", hash = "sha256:2dc46ec09f5d36f606ac8393303149e69d17121beee13c8dac25e2a2078e31c4"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-macosx_10_15_universal2.whl", hash = "sha256:c80653332c6136da7f4d4e143975e74ac0fa14f851f716d90583bc19e8945cea"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6e317ee1d4528a03506cb1c282cd9db73660a35b3564096de37de7350e7d87a7"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:943a22ebb3daacb45f76a9bcca9a7b74e7d94608c0c0505da30af900b998ca8d"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3fc9e71490d989144981ea21ef4fdfaa7b6aa84aff9632d91c736441ce2f6b00"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:e281a8071a06888575a4eb523c4deeefdcd2f5fe4a2d47e02ac8bf3a5b49f695"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:be77efd735bb1064605be8dec6e721141c1421ef0b115ef54e493a64e50e9a52"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:7a4ac2ffa34f1212dd586af90f4ba894e424f0cabb3a49cdcff944925640f6ac"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:ba698c7c252af83b6bba9775035263f0df5f807f0404019916d4b71af8161f66"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-win32.whl", hash = "sha256:214038aaa88e801e54c2ef0cfdb2e6df27eb05f67b477380a452b595c5ecfa37"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-win_amd64.whl", hash = "sha256:bad7fe0372e505442482ca3ccbc0d6f38dae81b1650f57a0aa6bbee18e7df495"},
|
||||
{file = "pyzmq-26.3.0-cp312-cp312-win_arm64.whl", hash = "sha256:b7b578d604e79e99aa39495becea013fd043fa9f36e4b490efa951f3d847a24d"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-macosx_10_15_universal2.whl", hash = "sha256:fa85953df84beb7b8b73cb3ec3f5d92b62687a09a8e71525c6734e020edf56fd"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:209d09f0ab6ddbcebe64630d1e6ca940687e736f443c265ae15bc4bfad833597"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d35cc1086f1d4f907df85c6cceb2245cb39a04f69c3f375993363216134d76d4"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b380e9087078ba91e45fb18cdd0c25275ffaa045cf63c947be0ddae6186bc9d9"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:6d64e74143587efe7c9522bb74d1448128fdf9897cc9b6d8b9927490922fd558"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:efba4f53ac7752eea6d8ca38a4ddac579e6e742fba78d1e99c12c95cd2acfc64"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:9b0137a1c40da3b7989839f9b78a44de642cdd1ce20dcef341de174c8d04aa53"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:a995404bd3982c089e57b428c74edd5bfc3b0616b3dbcd6a8e270f1ee2110f36"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-win32.whl", hash = "sha256:240b1634b9e530ef6a277d95cbca1a6922f44dfddc5f0a3cd6c722a8de867f14"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-win_amd64.whl", hash = "sha256:fe67291775ea4c2883764ba467eb389c29c308c56b86c1e19e49c9e1ed0cbeca"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313-win_arm64.whl", hash = "sha256:73ca9ae9a9011b714cf7650450cd9c8b61a135180b708904f1f0a05004543dce"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-macosx_10_15_universal2.whl", hash = "sha256:fea7efbd7e49af9d7e5ed6c506dfc7de3d1a628790bd3a35fd0e3c904dc7d464"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c4430c7cba23bb0e2ee203eee7851c1654167d956fc6d4b3a87909ccaf3c5825"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:016d89bee8c7d566fad75516b4e53ec7c81018c062d4c51cd061badf9539be52"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:04bfe59852d76d56736bfd10ac1d49d421ab8ed11030b4a0332900691507f557"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:1fe05bd0d633a0f672bb28cb8b4743358d196792e1caf04973b7898a0d70b046"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:2aa1a9f236d5b835fb8642f27de95f9edcfd276c4bc1b6ffc84f27c6fb2e2981"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-musllinux_1_1_i686.whl", hash = "sha256:21399b31753bf321043ea60c360ed5052cc7be20739785b1dff1820f819e35b3"},
|
||||
{file = "pyzmq-26.3.0-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:d015efcd96aca8882057e7e6f06224f79eecd22cad193d3e6a0a91ec67590d1f"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-macosx_10_15_universal2.whl", hash = "sha256:18183cc3851b995fdc7e5f03d03b8a4e1b12b0f79dff1ec1da75069af6357a05"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:da87e977f92d930a3683e10ba2b38bcc59adfc25896827e0b9d78b208b7757a6"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:cf6db401f4957afbf372a4730c6d5b2a234393af723983cbf4bcd13d54c71e1a"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:03caa2ffd64252122139d50ec92987f89616b9b92c9ba72920b40e92709d5e26"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:fbf206e5329e20937fa19bd41cf3af06d5967f8f7e86b59d783b26b40ced755c"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:6fb539a6382a048308b409d8c66d79bf636eda1b24f70c78f2a1fd16e92b037b"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:7897b8c8bbbb2bd8cad887bffcb07aede71ef1e45383bd4d6ac049bf0af312a4"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-win32.whl", hash = "sha256:91dead2daca698ae52ce70ee2adbb94ddd9b5f96877565fd40aa4efd18ecc6a3"},
|
||||
{file = "pyzmq-26.3.0-cp38-cp38-win_amd64.whl", hash = "sha256:8c088e009a6d6b9f563336adb906e3a8d3fd64db129acc8d8fd0e9fe22b2dac8"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-macosx_10_15_universal2.whl", hash = "sha256:2eaed0d911fb3280981d5495978152fab6afd9fe217fd16f411523665089cef1"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:7998b60ef1c105846fb3bfca494769fde3bba6160902e7cd27a8df8257890ee9"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:96c0006a8d1d00e46cb44c8e8d7316d4a232f3d8f2ed43179d4578dbcb0829b6"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5e17cc198dc50a25a0f245e6b1e56f692df2acec3ccae82d1f60c34bfb72bbec"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:92a30840f4f2a31f7049d0a7de5fc69dd03b19bd5d8e7fed8d0bde49ce49b589"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:f52eba83272a26b444f4b8fc79f2e2c83f91d706d693836c9f7ccb16e6713c31"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:952085a09ff32115794629ba47f8940896d7842afdef1283332109d38222479d"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-win32.whl", hash = "sha256:0240289e33e3fbae44a5db73e54e955399179332a6b1d47c764a4983ec1524c3"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-win_amd64.whl", hash = "sha256:b2db7c82f08b8ce44c0b9d1153ce63907491972a7581e8b6adea71817f119df8"},
|
||||
{file = "pyzmq-26.3.0-cp39-cp39-win_arm64.whl", hash = "sha256:2d3459b6311463c96abcb97808ee0a1abb0d932833edb6aa81c30d622fd4a12d"},
|
||||
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:ad03f4252d9041b0635c37528dfa3f44b39f46024ae28c8567f7423676ee409b"},
|
||||
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0f3dfb68cf7bf4cfdf34283a75848e077c5defa4907506327282afe92780084d"},
|
||||
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:356ec0e39c5a9cda872b65aca1fd8a5d296ffdadf8e2442b70ff32e73ef597b1"},
|
||||
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:749d671b0eec8e738bbf0b361168369d8c682b94fcd458c20741dc4d69ef5278"},
|
||||
{file = "pyzmq-26.3.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:f950f17ae608e0786298340163cac25a4c5543ef25362dd5ddb6dcb10b547be9"},
|
||||
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:b4fc9903a73c25be9d5fe45c87faababcf3879445efa16140146b08fccfac017"},
|
||||
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c15b69af22030960ac63567e98ad8221cddf5d720d9cf03d85021dfd452324ef"},
|
||||
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2cf9ab0dff4dbaa2e893eb608373c97eb908e53b7d9793ad00ccbd082c0ee12f"},
|
||||
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3ec332675f6a138db57aad93ae6387953763f85419bdbd18e914cb279ee1c451"},
|
||||
{file = "pyzmq-26.3.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:eb96568a22fe070590942cd4780950e2172e00fb033a8b76e47692583b1bd97c"},
|
||||
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-macosx_10_15_x86_64.whl", hash = "sha256:009a38241c76184cb004c869e82a99f0aee32eda412c1eb44df5820324a01d25"},
|
||||
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:4c22a12713707467abedc6d75529dd365180c4c2a1511268972c6e1d472bd63e"},
|
||||
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:1614fcd116275d24f2346ffca4047a741c546ad9d561cbf7813f11226ca4ed2c"},
|
||||
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e2cafe7e9c7fed690e8ecf65af119f9c482923b5075a78f6f7629c63e1b4b1d"},
|
||||
{file = "pyzmq-26.3.0-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:14e0b81753424bd374075df6cc30b87f2c99e5f022501d97eff66544ca578941"},
|
||||
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-macosx_10_15_x86_64.whl", hash = "sha256:21c6ddb98557a77cfe3366af0c5600fb222a1b2de5f90d9cd052b324e0c295e8"},
|
||||
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1fc81d5d60c9d40e692de14b8d884d43cf67562402b931681f0ccb3ce6b19875"},
|
||||
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:52b064fafef772d0f5dbf52d4c39f092be7bc62d9a602fe6e82082e001326de3"},
|
||||
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b72206eb041f780451c61e1e89dbc3705f3d66aaaa14ee320d4f55864b13358a"},
|
||||
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:8ab78dc21c7b1e13053086bcf0b4246440b43b5409904b73bfd1156654ece8a1"},
|
||||
{file = "pyzmq-26.3.0-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:0b42403ad7d1194dca9574cd3c56691c345f4601fa2d0a33434f35142baec7ac"},
|
||||
{file = "pyzmq-26.3.0.tar.gz", hash = "sha256:f1cd68b8236faab78138a8fc703f7ca0ad431b17a3fcac696358600d4e6243b3"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -5906,21 +5883,21 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "rtree"
|
||||
version = "1.3.0"
|
||||
version = "1.4.0"
|
||||
description = "R-Tree spatial index for Python GIS"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "Rtree-1.3.0-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:80879d9db282a2273ca3a0d896c84583940e9777477727a277624ebfd424c517"},
|
||||
{file = "Rtree-1.3.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:4328e9e421797c347e6eb08efbbade962fe3664ebd60c1dffe82c40911b1e125"},
|
||||
{file = "Rtree-1.3.0-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:037130d3ce1fc029de81941ec416ba5546f66228380ba19bb41f2ea1294e8423"},
|
||||
{file = "Rtree-1.3.0-py3-none-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:864a05d0c3b7ce6c5e34378b7ab630057603b79179368bc50624258bdf2ff631"},
|
||||
{file = "Rtree-1.3.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ec2ed6d1635753dab966e68f592a9c4896f3f4ec6ad2b09b776d592eacd883a9"},
|
||||
{file = "Rtree-1.3.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:b4485fb3e5c5e85b94a95f0a930a3848e040d2699cfb012940ba5b0130f1e09a"},
|
||||
{file = "Rtree-1.3.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:7e2e9211f4fb404c06a08fd2cbebb03234214f73c51913bb371c3d9954e99cc9"},
|
||||
{file = "Rtree-1.3.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:c021f4772b25cc24915da8073e553ded6fa8d0b317caa4202255ed26b2344c1c"},
|
||||
{file = "Rtree-1.3.0-py3-none-win_amd64.whl", hash = "sha256:97f835801d24c10bbf02381abe5e327345c8296ec711dde7658792376abafc66"},
|
||||
{file = "rtree-1.3.0.tar.gz", hash = "sha256:b36e9dd2dc60ffe3d02e367242d2c26f7281b00e1aaf0c39590442edaaadd916"},
|
||||
{file = "rtree-1.4.0-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:4d1bebc418101480aabf41767e772dd2155d3b27b1376cccbd93e4509485e091"},
|
||||
{file = "rtree-1.4.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:997f8c38d5dffa3949ea8adb4c8b291ea5cd4ef5ee69455d642dd171baf9991d"},
|
||||
{file = "rtree-1.4.0-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0133d9c54ab3ffe874ba6d411dbe0254765c5e68d92da5b91362c370f16fd997"},
|
||||
{file = "rtree-1.4.0-py3-none-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:d3b7bf1fe6463139377995ebe22a01a7005d134707f43672a3c09305e12f5f43"},
|
||||
{file = "rtree-1.4.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:27e4a6d617d63dcb82fcd4c2856134b8a3741bd1af3b1a0d98e886054f394da5"},
|
||||
{file = "rtree-1.4.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5258e826064eab82439760201e9421ce6d4340789d6d080c1b49367ddd03f61f"},
|
||||
{file = "rtree-1.4.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:20d5b3f9cf8bbbcc9fec42ab837c603c5dd86103ef29134300c8da2495c1248b"},
|
||||
{file = "rtree-1.4.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:a67bee1233370a4c72c0969a96d2a1df1ba404ddd9f146849c53ab420eab361b"},
|
||||
{file = "rtree-1.4.0-py3-none-win_amd64.whl", hash = "sha256:ba83efc7b7563905b1bfdfc14490c4bfb59e92e5e6156bdeb6ec5df5117252f4"},
|
||||
{file = "rtree-1.4.0.tar.gz", hash = "sha256:9d97c7c5dcf25f6c0599c76d9933368c6a8d7238f2c1d00e76f1a69369ca82a0"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -6241,13 +6218,13 @@ train = ["accelerate (>=0.20.3)", "datasets"]
|
||||
|
||||
[[package]]
|
||||
name = "setuptools"
|
||||
version = "75.8.2"
|
||||
version = "76.0.0"
|
||||
description = "Easily download, build, install, upgrade, and uninstall Python packages"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "setuptools-75.8.2-py3-none-any.whl", hash = "sha256:558e47c15f1811c1fa7adbd0096669bf76c1d3f433f58324df69f3f5ecac4e8f"},
|
||||
{file = "setuptools-75.8.2.tar.gz", hash = "sha256:4880473a969e5f23f2a2be3646b2dfd84af9028716d398e46192f84bc36900d2"},
|
||||
{file = "setuptools-76.0.0-py3-none-any.whl", hash = "sha256:199466a166ff664970d0ee145839f5582cb9bca7a0a3a2e795b6a9cb2308e9c6"},
|
||||
{file = "setuptools-76.0.0.tar.gz", hash = "sha256:43b4ee60e10b0d0ee98ad11918e114c70701bc6051662a9a675a0496c1a158f4"},
|
||||
]
|
||||
|
||||
[package.extras]
|
||||
@ -6491,13 +6468,13 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "threadpoolctl"
|
||||
version = "3.5.0"
|
||||
version = "3.6.0"
|
||||
description = "threadpoolctl"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "threadpoolctl-3.5.0-py3-none-any.whl", hash = "sha256:56c1e26c150397e58c4926da8eeee87533b1e32bef131bd4bf6a2f45f3185467"},
|
||||
{file = "threadpoolctl-3.5.0.tar.gz", hash = "sha256:082433502dd922bf738de0d8bcc4fdcbf0979ff44c42bd40f5af8a282f6fa107"},
|
||||
{file = "threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb"},
|
||||
{file = "threadpoolctl-3.6.0.tar.gz", hash = "sha256:8ab8b4aa3491d812b623328249fab5302a68d2d71745c8a4c719a2fcaba9f44e"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -6670,26 +6647,26 @@ testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests", "ruff"]
|
||||
|
||||
[[package]]
|
||||
name = "tokenizers"
|
||||
version = "0.21.0"
|
||||
version = "0.21.1"
|
||||
description = ""
|
||||
optional = false
|
||||
python-versions = ">=3.7"
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:3c4c93eae637e7d2aaae3d376f06085164e1660f89304c0ab2b1d08a406636b2"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:f53ea537c925422a2e0e92a24cce96f6bc5046bbef24a1652a5edc8ba975f62e"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b177fb54c4702ef611de0c069d9169f0004233890e0c4c5bd5508ae05abf193"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:6b43779a269f4629bebb114e19c3fca0223296ae9fea8bb9a7a6c6fb0657ff8e"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:9aeb255802be90acfd363626753fda0064a8df06031012fe7d52fd9a905eb00e"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d8b09dbeb7a8d73ee204a70f94fc06ea0f17dcf0844f16102b9f414f0b7463ba"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:400832c0904f77ce87c40f1a8a27493071282f785724ae62144324f171377273"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e84ca973b3a96894d1707e189c14a774b701596d579ffc7e69debfc036a61a04"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:eb7202d231b273c34ec67767378cd04c767e967fda12d4a9e36208a34e2f137e"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:089d56db6782a73a27fd8abf3ba21779f5b85d4a9f35e3b493c7bbcbbf0d539b"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:c87ca3dc48b9b1222d984b6b7490355a6fdb411a2d810f6f05977258400ddb74"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:4145505a973116f91bc3ac45988a92e618a6f83eb458f49ea0790df94ee243ff"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-win32.whl", hash = "sha256:eb1702c2f27d25d9dd5b389cc1f2f51813e99f8ca30d9e25348db6585a97e24a"},
|
||||
{file = "tokenizers-0.21.0-cp39-abi3-win_amd64.whl", hash = "sha256:87841da5a25a3a5f70c102de371db120f41873b854ba65e52bccd57df5a3780c"},
|
||||
{file = "tokenizers-0.21.0.tar.gz", hash = "sha256:ee0894bf311b75b0c03079f33859ae4b2334d675d4e93f5a4132e1eae2834fe4"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:e78e413e9e668ad790a29456e677d9d3aa50a9ad311a40905d6861ba7692cf41"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:cd51cd0a91ecc801633829fcd1fda9cf8682ed3477c6243b9a095539de4aecf3"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:28da6b72d4fb14ee200a1bd386ff74ade8992d7f725f2bde2c495a9a98cf4d9f"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:34d8cfde551c9916cb92014e040806122295a6800914bab5865deb85623931cf"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:aaa852d23e125b73d283c98f007e06d4595732104b65402f46e8ef24b588d9f8"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a21a15d5c8e603331b8a59548bbe113564136dc0f5ad8306dd5033459a226da0"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:2fdbd4c067c60a0ac7eca14b6bd18a5bebace54eb757c706b47ea93204f7a37c"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2dd9a0061e403546f7377df940e866c3e678d7d4e9643d0461ea442b4f89e61a"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:db9484aeb2e200c43b915a1a0150ea885e35f357a5a8fabf7373af333dcc8dbf"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:ed248ab5279e601a30a4d67bdb897ecbe955a50f1e7bb62bd99f07dd11c2f5b6"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:9ac78b12e541d4ce67b4dfd970e44c060a2147b9b2a21f509566d556a509c67d"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:e5a69c1a4496b81a5ee5d2c1f3f7fbdf95e90a0196101b0ee89ed9956b8a168f"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-win32.whl", hash = "sha256:1039a3a5734944e09de1d48761ade94e00d0fa760c0e0551151d4dd851ba63e3"},
|
||||
{file = "tokenizers-0.21.1-cp39-abi3-win_amd64.whl", hash = "sha256:0f0dcbcc9f6e13e675a66d7a5f2f225a736745ce484c1a4e07476a89ccdad382"},
|
||||
{file = "tokenizers-0.21.1.tar.gz", hash = "sha256:a1bb04dc5b448985f86ecd4b05407f5a8d97cb2c0532199b2a302a604a0165ab"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -7223,13 +7200,13 @@ typing-extensions = ">=3.7.4.3"
|
||||
|
||||
[[package]]
|
||||
name = "types-openpyxl"
|
||||
version = "3.1.5.20241225"
|
||||
version = "3.1.5.20250306"
|
||||
description = "Typing stubs for openpyxl"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "types_openpyxl-3.1.5.20241225-py3-none-any.whl", hash = "sha256:903d92f58f42135b0614d609868c619aee12e1c7b65ccf8472dfd2706bcc6f47"},
|
||||
{file = "types_openpyxl-3.1.5.20241225.tar.gz", hash = "sha256:3c076f4c6f114e1859b6857ffd486e96c938c0434451c60dc54c2bcb62750d78"},
|
||||
{file = "types_openpyxl-3.1.5.20250306-py3-none-any.whl", hash = "sha256:f7733dac1dcb07c89ff5ffde8452ee8d272be638defed855f4c48b2990ce5aa7"},
|
||||
{file = "types_openpyxl-3.1.5.20250306.tar.gz", hash = "sha256:aa7ad2425e8020ff46a31633becfe1f3c64114498d964c536199f654b464e6bc"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@ -7245,13 +7222,13 @@ files = [
|
||||
|
||||
[[package]]
|
||||
name = "types-requests"
|
||||
version = "2.32.0.20250301"
|
||||
version = "2.32.0.20250306"
|
||||
description = "Typing stubs for requests"
|
||||
optional = false
|
||||
python-versions = ">=3.9"
|
||||
files = [
|
||||
{file = "types_requests-2.32.0.20250301-py3-none-any.whl", hash = "sha256:0003e0124e2cbefefb88222ff822b48616af40c74df83350f599a650c8de483b"},
|
||||
{file = "types_requests-2.32.0.20250301.tar.gz", hash = "sha256:3d909dc4eaab159c0d964ebe8bfa326a7afb4578d8706408d417e17d61b0c500"},
|
||||
{file = "types_requests-2.32.0.20250306-py3-none-any.whl", hash = "sha256:25f2cbb5c8710b2022f8bbee7b2b66f319ef14aeea2f35d80f18c9dbf3b60a0b"},
|
||||
{file = "types_requests-2.32.0.20250306.tar.gz", hash = "sha256:0962352694ec5b2f95fda877ee60a159abdf84a0fc6fdace599f20acb41a03d1"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -7399,13 +7376,13 @@ zstd = ["zstandard (>=0.18.0)"]
|
||||
|
||||
[[package]]
|
||||
name = "virtualenv"
|
||||
version = "20.29.2"
|
||||
version = "20.29.3"
|
||||
description = "Virtual Python Environment builder"
|
||||
optional = false
|
||||
python-versions = ">=3.8"
|
||||
files = [
|
||||
{file = "virtualenv-20.29.2-py3-none-any.whl", hash = "sha256:febddfc3d1ea571bdb1dc0f98d7b45d24def7428214d4fb73cc486c9568cce6a"},
|
||||
{file = "virtualenv-20.29.2.tar.gz", hash = "sha256:fdaabebf6d03b5ba83ae0a02cfe96f48a716f4fae556461d180825866f75b728"},
|
||||
{file = "virtualenv-20.29.3-py3-none-any.whl", hash = "sha256:3e3d00f5807e83b234dfb6122bf37cfadf4be216c53a49ac059d02414f819170"},
|
||||
{file = "virtualenv-20.29.3.tar.gz", hash = "sha256:95e39403fcf3940ac45bc717597dba16110b74506131845d9b687d5e73d947ac"},
|
||||
]
|
||||
|
||||
[package.dependencies]
|
||||
@ -7861,4 +7838,4 @@ vlm = ["accelerate", "transformers", "transformers"]
|
||||
[metadata]
|
||||
lock-version = "2.0"
|
||||
python-versions = "^3.9"
|
||||
content-hash = "c37ae7d39cb2af7031248c2f0308c91160facafd948e982899245e5d8369bbbb"
|
||||
content-hash = "16324c95a8aae1a710c4151e509c59e9a97d8bb97d4c726861ab3215fbea0a9d"
|
||||
|
@ -1,6 +1,6 @@
|
||||
[tool.poetry]
|
||||
name = "docling"
|
||||
version = "2.26.0" # DO NOT EDIT, updated automatically
|
||||
version = "2.27.0" # DO NOT EDIT, updated automatically
|
||||
description = "SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications."
|
||||
authors = [
|
||||
"Christoph Auer <cau@zurich.ibm.com>",
|
||||
@ -46,9 +46,9 @@ packages = [{ include = "docling" }]
|
||||
######################
|
||||
python = "^3.9"
|
||||
pydantic = "^2.0.0"
|
||||
docling-core = {extras = ["chunking"], version = "^2.22.0"}
|
||||
docling-core = {extras = ["chunking"], version = "^2.23.1"}
|
||||
docling-ibm-models = "^3.4.0"
|
||||
docling-parse = "^3.3.0"
|
||||
docling-parse = "^4.0.0"
|
||||
filetype = "^1.2.0"
|
||||
pypdfium2 = "^4.30.0"
|
||||
pydantic-settings = "^2.3.0"
|
||||
@ -88,6 +88,7 @@ accelerate = [
|
||||
]
|
||||
pillow = ">=10.0.0,<12.0.0"
|
||||
tqdm = "^4.65.0"
|
||||
pluggy = "^1.0.0"
|
||||
pylatexenc = "^2.10"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
@ -156,6 +157,9 @@ rapidocr = ["rapidocr-onnxruntime", "onnxruntime"]
|
||||
docling = "docling.cli.main:app"
|
||||
docling-tools = "docling.cli.tools:app"
|
||||
|
||||
[tool.poetry.plugins."docling"]
|
||||
"docling_defaults" = "docling.models.plugins.defaults"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -140,13 +140,13 @@ tention encoding is then multiplied to the encoded image to produce a feature fo
|
||||
|
||||
The output features for each table cell are then fed into the feed-forward network (FFN). The FFN consists of a Multi-Layer Perceptron (3 layers with ReLU activation function) that predicts the normalized coordinates for the bounding box of each table cell. Finally, the predicted bounding boxes are classified based on whether they are empty or not using a linear layer.
|
||||
|
||||
Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$_{box}$ . l$_{box}$ consists of the generally used l$_{1}$ loss for object detection and the IoU loss ( l$_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets.
|
||||
Loss Functions. We formulate a multi-task loss Eq. 2 to train our network. The Cross-Entropy loss (denoted as l$\_{s}$ ) is used to train the Structure Decoder which predicts the structure tokens. As for the Cell BBox Decoder it is trained with a combination of losses denoted as l$\_{box}$ . l$\_{box}$ consists of the generally used l$\_{1}$ loss for object detection and the IoU loss ( l$\_{iou}$ ) to be scale invariant as explained in [25]. In comparison to DETR, we do not use the Hungarian algorithm [15] to match the predicted bounding boxes with the ground-truth boxes, as we have already achieved a one-toone match through two steps: 1) Our token input sequence is naturally ordered, therefore the hidden states of the table data cells are also in order when they are provided as input to the Cell BBox Decoder , and 2) Our bounding boxes generation mechanism (see Sec. 3) ensures a one-to-one mapping between the cell content and its bounding box for all post-processed datasets.
|
||||
|
||||
The loss used to train the TableFormer can be defined as following:
|
||||
|
||||
<!-- formula-not-decoded -->
|
||||
|
||||
where λ ∈ [0, 1], and λ$_{iou}$, λ$_{l}$$\_{1}$ ∈$\_{R}$ are hyper-parameters.
|
||||
where λ ∈ [0, 1], and λ$\_{iou}$, λ$\_{l}$$\_{1}$ ∈$\_{R}$ are hyper-parameters.
|
||||
|
||||
## 5. Experimental Results
|
||||
|
||||
@ -176,7 +176,7 @@ The Tree-Edit-Distance-Based Similarity (TEDS) metric was introduced in [37]. It
|
||||
|
||||
<!-- formula-not-decoded -->
|
||||
|
||||
where T$_{a}$ and T$_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T .
|
||||
where T$\_{a}$ and T$\_{b}$ represent tables in tree structure HTML format. EditDist denotes the tree-edit distance, and | T | represents the number of nodes in T .
|
||||
|
||||
## 5.4. Quantitative Analysis
|
||||
|
||||
@ -372,7 +372,7 @@ Here is a step-by-step description of the prediction postprocessing:
|
||||
|
||||
<!-- formula-not-decoded -->
|
||||
|
||||
where c is one of { left, centroid, right } and x$_{c}$ is the xcoordinate for the corresponding point.
|
||||
where c is one of { left, centroid, right } and x$\_{c}$ is the xcoordinate for the corresponding point.
|
||||
|
||||
- 5. Use the alignment computed in step 4, to compute the median x -coordinate for all table columns and the me-
|
||||
- 6. Snap all cells with bad IOU to their corresponding median x -coordinates and cell sizes.
|
||||
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-comma-in-cell",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-comma",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-inconsistent-header",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-pipe",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-semicolon",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-tab",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-too-few-columns",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "csv-too-many-columns",
|
||||
"origin": {
|
||||
"mimetype": "text/csv",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.2.0",
|
||||
"version": "1.3.0",
|
||||
"name": "equations",
|
||||
"origin": {
|
||||
"mimetype": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_01",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
This is the first paragraph of the introduction.
|
||||
|
||||
## Background
|
||||
### Background
|
||||
|
||||
Some background information here.
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_02",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
This is the first paragraph of the introduction.
|
||||
|
||||
## Background
|
||||
### Background
|
||||
|
||||
Some background information here.
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_03",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -1,10 +1,10 @@
|
||||
# Example Document
|
||||
|
||||
## Introduction
|
||||
### Introduction
|
||||
|
||||
This is the first paragraph of the introduction.
|
||||
|
||||
## Background
|
||||
### Background
|
||||
|
||||
Some background information here.
|
||||
|
||||
@ -16,9 +16,9 @@ Some background information here.
|
||||
1. First item in ordered list
|
||||
1. Nested ordered item 1
|
||||
2. Nested ordered item 2
|
||||
3. Second item in ordered list
|
||||
2. Second item in ordered list
|
||||
|
||||
## Data Table
|
||||
### Data Table
|
||||
|
||||
| Header 1 | Header 2 | Header 3 |
|
||||
|--------------|--------------|--------------|
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_04",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -1,7 +1,7 @@
|
||||
# Data Table with Rowspan and Colspan
|
||||
|
||||
| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) |
|
||||
| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) |
|
||||
|----------------------------|----------------------------|----------------------------|
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) |
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) |
|
||||
| Row 3, Col 1 | Row 3, Col 2 | Row 3, Col 3 |
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_05",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -1,7 +1,7 @@
|
||||
# Omitted html and body tags
|
||||
|
||||
| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) |
|
||||
| Header 1 | Header 2 & 3 (colspan) | Header 2 & 3 (colspan) |
|
||||
|----------------------------|----------------------------|----------------------------|
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) |
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 1, Col 2 | Row 1, Col 3 |
|
||||
| Row 1 & 2, Col 1 (rowspan) | Row 2, Col 2 & 3 (colspan) | Row 2, Col 2 & 3 (colspan) |
|
||||
| Row 3, Col 1 | Row 3, Col 2 | Row 3, Col 3 |
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_06",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.2.0",
|
||||
"version": "1.3.0",
|
||||
"name": "example_07",
|
||||
"origin": {
|
||||
"mimetype": "text/html",
|
||||
|
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema_name": "DoclingDocument",
|
||||
"version": "1.1.0",
|
||||
"version": "1.3.0",
|
||||
"name": "ipa20180000016.xml",
|
||||
"origin": {
|
||||
"mimetype": "application/xml",
|
||||
|
@ -1,20 +1,20 @@
|
||||
# LIGHT EMITTING DEVICE AND PLANT CULTIVATION METHOD
|
||||
|
||||
## ABSTRACT
|
||||
### ABSTRACT
|
||||
|
||||
Provided is a light emitting device that includes a light emitting element having a light emission peak wavelength ranging from 380 nm to 490 nm, and a fluorescent material excited by light from the light emitting element and emitting light having at a light emission peak wavelength ranging from 580 nm or more to less than 680 nm. The light emitting device emits light having a ratio R/B of a photon flux density R to a photon flux density B ranging from 2.0 to 4.0 and a ratio R/FR of the photon flux density R to a photon flux density FR ranging from 0.7 to 13.0, the photon flux density R being in a wavelength range of 620 nm or more and less than 700 nm, the photon flux density B being in a wavelength range of 380 nm or more and 490 nm or less, and the photon flux density FR being in a wavelength range of 700 nm or more and 780 nm or less.
|
||||
|
||||
## CROSS-REFERENCE TO RELATED APPLICATION
|
||||
### CROSS-REFERENCE TO RELATED APPLICATION
|
||||
|
||||
The application claims benefit of Japanese Patent Application No. 2016-128835 filed on Jun. 29, 2016, the entire disclosure of which is hereby incorporated by reference in its entirety.
|
||||
|
||||
## BACKGROUND
|
||||
### BACKGROUND
|
||||
|
||||
## Technical Field
|
||||
### Technical Field
|
||||
|
||||
The present disclosure relates to a light emitting device and a plant cultivation method.
|
||||
|
||||
## Description of Related Art
|
||||
### Description of Related Art
|
||||
|
||||
With environmental changes due to climate change and other artificial disruptions, plant factories are expected to increase production efficiency of vegetables and be capable of adjusting production in order to make it possible to stably supply vegetables. Plant factories that are capable of artificial management can stably supply clean and safe vegetables to markets, and therefore are expected to be the next-generation industries.
|
||||
|
||||
@ -26,7 +26,7 @@ In plant factories, the light source used in place of sunlight affect a growth p
|
||||
|
||||
For example, Japanese Unexamined Patent Publication No. 2009-125007 discloses a plant growth method. In this method, the plants is irradiated with light emitted from a first LED light emitting element and/or a second LED light emitting element at predetermined timings using a lighting apparatus including the first LED light emitting element emitting light having a wavelength region of 625 to 690 nm and the second LED light emitting element emitting light having a wavelength region of 420 to 490 nm in order to emit lights having sufficient intensities and different wavelengths from each other.
|
||||
|
||||
## SUMMARY
|
||||
### SUMMARY
|
||||
|
||||
However, even though plants are merely irradiated with lights having different wavelengths as in the plant growth method disclosed in Japanese Unexamined Patent Publication No. 2009-125007, the effect of promoting plant growth is not sufficient. Further improvement is required in promotion of plant growth.
|
||||
|
||||
@ -40,7 +40,7 @@ A second embodiment of the present disclosure is a plant cultivation method incl
|
||||
|
||||
According to embodiments of the present disclosure, a light emitting device capable of promoting growth of plants and a plant cultivation method can be provided.
|
||||
|
||||
## BRIEF DESCRIPTION OF THE DRAWINGS
|
||||
### BRIEF DESCRIPTION OF THE DRAWINGS
|
||||
|
||||
FIG. 1 is a schematic cross sectional view of a light emitting device according to an embodiment of the present disclosure.
|
||||
|
||||
@ -50,11 +50,11 @@ FIG. 3 is a graph showing fresh weight (edible part) at the harvest time of each
|
||||
|
||||
FIG. 4 is a graph showing nitrate nitrogen content in each plant grown by irradiating the plant with light from exemplary light emitting devices according to embodiments of the present disclosure and a comparative light emitting device.
|
||||
|
||||
## DETAILED DESCRIPTION
|
||||
### DETAILED DESCRIPTION
|
||||
|
||||
A light emitting device and a plant cultivation method according to the present invention will be described below based on an embodiment. However, the embodiment described below only exemplifies the technical concept of the present invention, and the present invention is not limited to the light emitting device and plant cultivation method described below. In the present specification, the relationship between the color name and the chromaticity coordinate, the relationship between the wavelength range of light and the color name of monochromatic light follows JIS Z8110.
|
||||
|
||||
### Light Emitting Device
|
||||
#### Light Emitting Device
|
||||
|
||||
An embodiment of the present disclosure is a light emitting device including a light emitting element having a light emission peak wavelength in a range of 380 nm or more and 490 nm or less (hereinafter sometimes referred to as a “region of from near ultraviolet to blue color”), and a first fluorescent material emitting light having at least one light emission peak wavelength in a range of 580 nm or more and less than 680 nm by being excited by light from the light emitting element. The light emitting device emits light having a ratio R/B of a photon flux density R to a photon flux density B within a range of 2.0 or more and 4.0 or less, and a ratio R/FR of the photon flux density R to a photon flux density FR within a range of 0.7 or more and 13.0 or less, where the photon flux density R is the number of light quanta (μmol·m⁻²·g⁻¹) incident per unit time and unit area in a wavelength range of 620 nm or more and less than 700 nm, the photon flux density B is the number of light quanta (μmol·m⁻²·g⁻¹) incident per unit time and unit area in a wavelength range of 380 nm or more and 490 nm or less, and the photon flux density FR is the number of light quanta (μmol·m⁻²·g⁻¹) incident per unit time and unit area in a wavelength range of 700 nm or more and 780 nm or less.
|
||||
|
||||
@ -84,7 +84,7 @@ For the above reasons, nitrogen is one of nutrients necessary for growth of plan
|
||||
|
||||
It is preferred that the light emitting device 100 further include the second fluorescent material 72 having at least one light emission peak wavelength in a range of 680 nm or more and 800 nm or less by being excited by light from the light emitting element 10, wherein the R/FR ratio is within a range of 0.7 or more and 5.0 or less. The R/FR ratio is more preferably within a range of 0.7 or more and 2.0 or less.
|
||||
|
||||
### Light Emitting Element
|
||||
#### Light Emitting Element
|
||||
|
||||
The light emitting element 10 is used as an excitation light source, and is a light emitting element emitting light having a light emission peak wavelength in a range of 380 nm or more and 490 nm or less. As a result, a stable light emitting device having high efficiency, high linearity of output to input and strong mechanical impacts can be obtained.
|
||||
|
||||
@ -92,7 +92,7 @@ The range of the light emission peak wavelength of the light emitting element 10
|
||||
|
||||
The half value width of emission spectrum of the light emitting element 10 can be, for example, 30 nm or less.
|
||||
|
||||
### Fluorescent Member
|
||||
#### Fluorescent Member
|
||||
|
||||
The fluorescent member 50 used in the light emitting device 100 preferably includes the first fluorescent material 71 and a sealing material, and more preferably further includes the second fluorescent material 72. A thermoplastic resin and a thermosetting resin can be used as the sealing material. The fluorescent member 50 may contain other components such as a filler, a light stabilizer and a colorant, in addition to the fluorescent material and the sealing material. Examples of the filler include silica, barium titanate, titanium oxide and aluminum oxide.
|
||||
|
||||
@ -100,7 +100,7 @@ The content of other components other than the fluorescent material 70 and the s
|
||||
|
||||
The total content of the fluorescent material 70 in the fluorescent member 50 can be, for example, 5 parts by mass or more and 300 parts by mass or less, per 100 parts by mass of the sealing material. The total content is preferably 10 parts by mass or more and 250 parts by mass or less, more preferably 15 parts by mass or more and 230 parts by mass or less, and still more preferably 15 parts by mass or more and 200 parts by mass or less. When the total content of the fluorescent material 70 in the fluorescent member 50 is within the above range, the light emitted from the light emitting element 10 can be efficiently subjected to wavelength conversion in the fluorescent material 70.
|
||||
|
||||
### First Fluorescent Material
|
||||
#### First Fluorescent Material
|
||||
|
||||
The first fluorescent material 71 is a fluorescent material that is excited by light from the light emitting element 10 and emits light having at least one light emission peak wavelength in a range of 580 nm or more and less than 680 nm. Examples of the first fluorescent material 71 include an Mn⁴⁺-activated fluorogermanate fluorescent material, an Eu²⁺-activated nitride fluorescent material, an Eu²⁺-activated alkaline earth sulfide fluorescent material and an Mn⁴⁺-activated halide fluorescent material. The first fluorescent material 71 may use one selected from those fluorescent materials and may use a combination of two or more thereof. The first fluorescent material preferably contains an Eu²⁺-activated nitride fluorescent material and an Mn⁴⁺-activated fluorogermanate fluorescent material.
|
||||
|
||||
@ -138,7 +138,7 @@ The first fluorescent material 71 preferably contains at least two fluorescent m
|
||||
|
||||
In the case where the first fluorescent material 71 contains at least two fluorescent materials and two fluorescent materials are a MGF fluorescent material and a CASN fluorescent material, where a compounding ratio thereof (MGF fluorescent material:CASN fluorescent material) is preferably in a range of 50:50 or more and 99:1 or less, more preferably in a range of 60:40 or more and 97:3 or less, and still more preferably in a range of 70:30 or more and 96:4 or less, in mass ratio. In the case where the first fluorescent material contains two fluorescent materials, when those fluorescent materials are a MGF fluorescent material and a CASN fluorescent material and the mass ratio thereof is within the aforementioned range, the light emitted from the light emitting element 10 can be efficiently subjected to wavelength conversion in the first fluorescent material 71. In addition, the R/B ratio can be adjusted to within a range of 2.0 or more and 4.0 or less, and the R/FR ratio is easy to be adjusted to within a range of 0.7 or more and 13.0 or less.
|
||||
|
||||
### Second Fluorescent Material
|
||||
#### Second Fluorescent Material
|
||||
|
||||
The second fluorescent material 72 is a fluorescent material that is excited by the light from the light emitting element 10 and emits light having at least one light emission peak wavelength in a range of 680 nm or more and 800 nm or less.
|
||||
|
||||
@ -160,25 +160,25 @@ In the second fluorescent material 72, the value of the parameter y is preferabl
|
||||
|
||||
The parameter x is an activation amount of Ce and the value of the parameter x is in a range of exceeding 0.0002 and less than 0.50 (0.0002<x<0.50), and the parameter y is an activation amount of Cr. When the value of the parameter y is in a range of exceeding 0.0001 and less than 0.05 (0.0001<y<0.05), the activation amount of Ce and the activation amount of Cr that are light emission centers contained in the crystal structure of the fluorescent material are within optimum ranges, the decrease of light emission intensity due to the decrease of light emission center can be suppressed, the decrease of light emission intensity due to concentration quenching caused by the increase of the activation amount can be suppressed, and light emission intensity can be enhanced.
|
||||
|
||||
### Production Method of Second Fluorescent Material
|
||||
#### Production Method of Second Fluorescent Material
|
||||
|
||||
A method for producing the second fluorescent material 72 includes the following method.
|
||||
|
||||
A compound containing at least one rare earth element Ln selected from the group consisting of rare earth elements excluding Ce, a compound containing at least one element M selected from the group consisting of Al, Ga, and In, a compound containing Ce and a compound containing Cr are mixed such that, when the total molar composition ratio of the M is taken as 5 as the standard, in the case where the total molar composition ratio of Ln, Ce, and Nd is 3, the molar ratio of Ce is a product of 3 and a value of a parameter x, and the molar ratio of Cr is a product of 3 and a value of a parameter y, the value of the parameter x is in a range of exceeding 0.0002 and less than 0.50 and the value of the parameter y is in a range of exceeding 0.0001 and less than 0.05, thereby obtaining a raw material mixture, the raw material mixture is heat-treated, followed by classification and the like, thereby obtaining the second fluorescent material.
|
||||
|
||||
### Compound Containing Rare Earth Element Ln
|
||||
#### Compound Containing Rare Earth Element Ln
|
||||
|
||||
Examples of the compound containing rare earth element Ln include oxides, hydroxides, nitrides, oxynitrides, fluorides, and chlorides, that contain at least one rare earth element Ln selected from the group consisting of rare earth elements excluding Ce. Those compounds may be hydrates. At least a part of the compounds containing rare earth element may use a metal simple substance or an alloy containing rare earth element. The compound containing rare earth element is preferably a compound containing at least one rare earth element Ln selected from the group consisting of Y, Gd, Lu, La, Tb, and Pr. The compound containing rare earth element may be used alone or may be used as a combination of at least two compounds containing rare earth element.
|
||||
|
||||
The compound containing rare earth element is preferably an oxide that does not contain elements other than the target composition, as compared with other materials. Examples of the oxide specifically include Y₂O₃, Gd₂O₃, Lu₂O₃, La₂O₃, Tb₄O₇ and Pr₆O₁₁.
|
||||
|
||||
### Compound Containing M
|
||||
#### Compound Containing M
|
||||
|
||||
Examples of the compound containing at least one element M selected from the group consisting of Al, Ga, and In include oxides, hydroxides, nitrides, oxynitrides, fluorides, and chlorides, that contain Al, Ga, or In. Those compounds may be hydrates. Furthermore, Al metal simple substance, Ga metal simple substance, In metal simple substance, Al alloy, Ga alloy or In alloy may be used, and metal simple substance or an alloy may be used in place of at least a part of the compound. The compound containing Al, Ga, or In may be used alone or may be used as a combination of two or more thereof. The compound containing at least one element selected from the group consisting of Al, Ga, and In is preferably an oxide. The reason for this is that an oxide that does not contain elements other than the target composition, as compared with other materials, and a fluorescent material having a target composition are easy to be obtained. When a compound containing elements other than the target composition has been used, residual impurity elements are sometimes present in the fluorescent material obtained. The residual impurity element becomes a killer factor in light emission, leading to the possibility of remarkable decrease of light emission intensity.
|
||||
|
||||
Examples of the compound containing Al, Ga, or In specifically include Al₂O₃, Ga₂O₃, and In₂O₃.
|
||||
|
||||
### Compound Containing Ce and Compound Containing Cr
|
||||
#### Compound Containing Ce and Compound Containing Cr
|
||||
|
||||
Examples of the compound containing Ce or the compound containing Cr include oxides, hydroxides, nitrides, fluorides, and chlorides, that contain cerium (Ce) or chromium (Cr). Those compounds may be hydrates. Ce metal simple substance, Ce alloy, Cr metal simple substance, or Cr alloy may be used, and a metal simple substance or an alloy may be used in place of a part of the compound. The compound containing Ce or the compound containing Cr may be used alone or may be used as a combination of two or more thereof. The compound containing Ce or the compound containing Cr is preferably an oxide. The reason for this is that an oxide that does not contain elements other than the target composition, as compared with other materials, and a fluorescent material having a target composition are easy to be obtained. When a compound containing elements other than the target composition has been used, residual impurity elements are sometimes present in the fluorescent material obtained. The residual impurity element becomes a killer factor in light emission, leading to the possibility of remarkable decrease of light emission intensity.
|
||||
|
||||
@ -204,7 +204,7 @@ The atmosphere for heat-treating the raw material mixture is an inert atmosphere
|
||||
|
||||
The fluorescent material obtained may be subjected to post-treatment steps such as a solid-liquid separation by a method such as cleaning or filtration, drying by a method such as vacuum drying, and classification by dry sieving. After those post-treatment steps, a fluorescent material having a desired average particle diameter is obtained.
|
||||
|
||||
### Other Fluorescent Materials
|
||||
#### Other Fluorescent Materials
|
||||
|
||||
The light emitting device 100 may contain other kinds of fluorescent materials, in addition to the first fluorescent material 71.
|
||||
|
||||
@ -250,21 +250,21 @@ GdAlO₃:Cr (x)
|
||||
|
||||
The light emitting device 100 can be utilized as a light emitting device for plant cultivation that can activate photosynthesis of plants and promote growth of plants so as to have favorable form and weight.
|
||||
|
||||
### Plant Cultivation Method
|
||||
#### Plant Cultivation Method
|
||||
|
||||
The plant cultivation method of one embodiment of the present disclosure is a method for cultivating plants, including irradiating plants with light emitted from the light emitting device 100. In the plant cultivation method, plants can be irradiated with light from the light emitting device 100 in plant factories that are completely isolated from external environment and make it possible for artificial control. The kind of plants is not particularly limited. However, the light emitting device 100 of one embodiment of the present disclosure can activate photosynthesis of plants and promote growth of plants such that a stem, a leaf, a root, a fruit have favorable form and weight, and therefore is preferably applied to cultivation of vegetables, flowers that contain much chlorophyll performing photosynthesis. Examples of the vegetables include lettuces such as garden lettuce, curl lettuce, Lamb's lettuce, Romaine lettuce, endive, Lollo Rosso, Rucola lettuce, and frill lettuce; Asteraceae vegetables such as “shungiku” (chrysanthemum coronarium); morning glory vegetables such as spinach; Rosaceae vegetables such as strawberry; and flowers such as chrysanthemum, gerbera, rose, and tulip.
|
||||
|
||||
## EXAMPLES
|
||||
### EXAMPLES
|
||||
|
||||
The present invention is further specifically described below by Examples and Comparative Examples.
|
||||
|
||||
## Examples 1 to 5
|
||||
### Examples 1 to 5
|
||||
|
||||
### First Fluorescent Material
|
||||
#### First Fluorescent Material
|
||||
|
||||
Two fluorescent materials of fluorogarmanate fluorescent material that is activated by Mn⁴⁺, having a light emission peak at 660 nm and fluorescent material containing silicon nitride that are activated by Eu²⁺, having a light emission peak at 660 nm were used as the first fluorescent material 71. In the first fluorescent material 71, a mass ratio of a MGF fluorescent material to a CASN fluorescent material (MGF:CASN) was 95:5.
|
||||
|
||||
### Second Fluorescent Material
|
||||
#### Second Fluorescent Material
|
||||
|
||||
Fluorescent material that is obtained by the following production method was used as the second fluorescent material 72.
|
||||
|
||||
@ -272,7 +272,7 @@ Fluorescent material that is obtained by the following production method was use
|
||||
|
||||
The raw material mixture obtained was placed in an alumina crucible, and a lid was put on the alumina crucible. The raw material mixture was heat-treated at 1,500° C. for 10 hours in a reducing atmosphere of H₂: 3 vol % and N₂: 97 vol %. Thus, a calcined product was obtained. The calcined product was passed through a dry sieve to obtain a second fluorescent material. The second fluorescent material obtained was subjected to composition analysis by ICP-AES emission spectrometry using an inductively coupled plasma emission analyzer (manufactured by Perkin Elmer). The composition of the second fluorescent material obtained was (Y₀.₉₇₇Ce₀.₀₀₉Cr₀.₀₁₄)₃Al₅O₁₂ (hereinafter referred to as “YAG: Ce, Cr”).
|
||||
|
||||
### Light Emitting Device
|
||||
#### Light Emitting Device
|
||||
|
||||
Nitride semiconductor having a light emission peak wavelength of 450 nm was used as the light emitting element 10 in the light emitting device 100.
|
||||
|
||||
@ -280,17 +280,17 @@ Silicone resin was used as a sealing material constituting the fluorescent membe
|
||||
|
||||
The resin composition was poured on the light emitting element 10 of a depressed portion of the molded article 40 to fill the depressed portion, and heated at 150° C. for 4 hours to cure the resin composition, thereby forming the fluorescent member 50. Thus, the light emitting device 100 as shown in FIG. 1 was produced in each of Examples 1 to 5.
|
||||
|
||||
## Comparative Example 1
|
||||
### Comparative Example 1
|
||||
|
||||
A light emitting device X including a semiconductor light emitting element having a light emission peak wavelength of 450 nm and a light emitting device Y including a semiconductor light emitting element having a light emission peak length of 660 nm were used, and the R/B ratio was adjusted to 2.5.
|
||||
|
||||
### Evaluation
|
||||
#### Evaluation
|
||||
|
||||
### Photon Flux Density
|
||||
#### Photon Flux Density
|
||||
|
||||
Photon flux densities of lights emitted from the light emitting device 100 used in Examples 1 to 5 and the light emitting devices X and Y used in Comparative Example 1 were measured using a photon measuring device (LI-250A, manufactured by Li-COR). The photon flux density B, the photon flux density R, and the photon flux density FR of lights emitted from the light emitting devices used in each of the Examples and Comparative Example; the R/B ratio; and the R/FR ratio are shown in Table 1. FIG. 2 shows spectra showing the relationship between a wavelength and a relative photon flux density, in the light emitting devices used in each Example and Comparative Example.
|
||||
|
||||
### Plant Cultivation Test
|
||||
#### Plant Cultivation Test
|
||||
|
||||
The plant cultivation method includes a method of conducting by “growth period by RGB light source (hereinafter referred to as a first growth period)” and “growth period by light source for plant growth (hereinafter referred to as a second growth period)” using a light emitting device according to an embodiment of the present disclosure as a light source.
|
||||
|
||||
@ -306,21 +306,21 @@ The cultivation test was specifically conducted by the following method.
|
||||
|
||||
Romaine lettuce (green romaine, produced by Nakahara Seed Co., Ltd.) was used as cultivation plant.
|
||||
|
||||
### First Growth Period
|
||||
#### First Growth Period
|
||||
|
||||
Urethane sponges (salad urethane, manufactured by M Hydroponic Research Co., Ltd.) having Romaine lettuce seeded therein were placed side by side on a plastic tray, and were irradiated with light from RGB-LED light source (manufactured by Shibasaki Inc.) to cultivate plants. The plants were cultivated for 16 days under the conditions of room temperature: 22 to 23° C., humidity: 50 to 60%, photon flux density from light emitting device: 100 μmol·m⁻²·s⁻¹ and daytime hour: 16 hours/day. Only water was given until germination, and after the germination (about 4 days later), a solution obtained by mixing Otsuka House #1 (manufactured by Otsuka Chemical Co., Ltd.) and Otsuka House #2 (manufactured by Otsuka Chemical Co., Ltd.) in a mass ratio of 3:2 and dissolving the mixture in water was used as a nutrient solution (Otsuka Formulation A). Conductivity of the nutrient was 1.5 ms·cm⁻¹.
|
||||
|
||||
### Second Growth Period
|
||||
#### Second Growth Period
|
||||
|
||||
After the first growth period, the plants were irradiated with light from the light emitting devices of Examples 1 to 5 and Comparative Example 1, and were subjected to hydroponics.
|
||||
|
||||
The plants were cultivated for 19 days under the conditions of room temperature: 22 to 24° C., humidity: 60 to 70%, CO₂ concentration: 600 to 700 ppm, photon flux density from light emitting device: 125 μmol·m⁻²·s⁻¹ and daytime hour: 16 hours/day. Otsuka Formulation A was used as the nutrient solution. Conductivity of the nutrient was 1.5 ms·cm⁻¹. The values of the R/B and R/FR ratios of light for plant irradiation from each light emitting device in the second growth period are shown in Table 1.
|
||||
|
||||
### Measurement of Fresh Weight (Edible Part)
|
||||
#### Measurement of Fresh Weight (Edible Part)
|
||||
|
||||
The plants after cultivation were harvested, and wet weights of a terrestrial part and a root were measured. The wet weight of a terrestrial part of each of 6 cultivated plants having been subjected to hydroponics by irradiating with light from the light emitting devices of Examples 1 to 5 and Comparative Example 1 was measured as a fresh weight (edible part) (g). The results obtained are shown in Table 1 and FIG. 3.
|
||||
|
||||
### Measurement of Nitrate Nitrogen Content
|
||||
#### Measurement of Nitrate Nitrogen Content
|
||||
|
||||
The edible part (about 20 g) of each of the cultivated plants, from which a foot about 5 cm had been removed, was frozen with liquid nitrogen and crushed with a juice mixer (laboratory mixer LM-PLUS, manufactured by Osaka Chemical Co., Ltd.) for 1 minute. The resulting liquid was filtered with Miracloth (manufactured by Milipore), and the filtrate was centrifuged at 4° C. and 15,000 rpm for 5 minutes. The nitrate nitrogen content (mg/100 g) in the cultivated plant in the supernatant was measured using a portable reflection photometer system (product name: RQ flex system, manufactured by Merck) and a test paper (product name: Reflectoquant (registered trade mark), manufactured by Kanto Chemical Co., Inc.). The results are shown in Table 1 and FIG. 4.
|
||||
|
||||
@ -355,7 +355,7 @@ In addition, in the foregoing Detailed Description, various features may be grou
|
||||
|
||||
The above disclosed subject matter shall be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments which fall within the true spirit and scope of the present disclosure. Thus, to the maximum extent allowed by law, the scope of the present disclosure may be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
|
||||
|
||||
## CLAIMS
|
||||
### CLAIMS
|
||||
|
||||
1. A light emitting device comprising: a light emitting element having a light emission peak wavelength in a range of 380 nm or more and 490 nm or less; and a fluorescent material that is excited by light from the light emitting element and emits light having at least one light emission peak wavelength in a range of 580 nm or more and less than 680 nm, wherein the light emitting device emits light having a ratio R/B of a photon flux density R to a photon flux density B within a range of 2.0 or more and 4.0 or less, and a ratio R/FR of the photon flux density R to a photon flux density FR within a range of 0.7 or more and 13.0 or less, wherein the photon flux density R is in a wavelength range of 620 nm or more and less than 700 nm, the photon flux density B is in a wavelength range of 380 nm or more and 490 nm or less, and the photon flux density FR is in a wavelength range of 700 nm or more and 780 nm or less.
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user