Increase focus on confidence grades, scores are informational only

Signed-off-by: Fabiano Franz <contact@fabianofranz.com>
2025-07-25 19:44:34 +00:00 · 2025-07-15 17:39:24 -03:00 · 2025-07-15 17:39:24 -03:00 · 9d29552194
commit 9d29552194
parent 5ad5fc36ee
1 changed files with 22 additions and 24 deletions
--- a/docs/concepts/confidence_scores.md
+++ b/docs/concepts/confidence_scores.md
@ -1,6 +1,6 @@
 ## Introduction

-**Confidence scores** were introduced in [v2.34.0](https://github.com/docling-project/docling/releases/tag/v2.34.0) to help users understand how well a conversion performed and guide decisions about post-processing workflows. They are available in the [`confidence`](../../reference/document_converter/#docling.document_converter.ConversionResult.confidence) field of the [`ConversionResult`](../../reference/document_converter/#docling.document_converter.ConversionResult) object returned by the document converter.
+**Confidence grades** were introduced in [v2.34.0](https://github.com/docling-project/docling/releases/tag/v2.34.0) to help users understand how well a conversion performed and guide decisions about post-processing workflows. They are available in the [`confidence`](../../reference/document_converter/#docling.document_converter.ConversionResult.confidence) field of the [`ConversionResult`](../../reference/document_converter/#docling.document_converter.ConversionResult) object returned by the document converter.

 ## Purpose

@ -8,12 +8,16 @@ Complex layouts, poor scan quality, or challenging formatting can lead to subopt

 Confidence scores provide a quantitative assessment of document conversion quality. Each confidence report includes a **numerical score** (0.0 to 1.0) measuring conversion accuracy, and a **quality grade** (poor, fair, good, excellent) for quick interpretation.

-Use cases for confidence scores include:
+!!! note "Focus on quality grades!"
+
+    Users can and should safely focus on the document-level grade fields — `mean_grade` and `low_grade` — to assess overall conversion quality. Numerical scores are used internally and are for informational purposes only; their computation and weighting may change in the future.
+
+Use cases for confidence grades include:

 - Identify documents requiring manual review after the conversion
 - Adjust conversion pipelines to the most appropriate for each document type
 - Set confidence thresholds for unattended batch conversions
- Catch potential conversion issues early in your workflow
+- Catch potential conversion issues early in your workflow.

 ## Concepts

@ -21,41 +25,35 @@ Use cases for confidence scores include:

 A confidence report contains *scores* and *grades*:

- **Scores**: Numerical values between 0.0 and 1.0, where higher values indicate better conversion quality
- **Grades**: Categorical quality assessments based on score thresholds:
-  - `POOR`: Score < 0.5
-  - `FAIR`: Score < 0.8  
-  - `GOOD`: Score < 0.9
-  - `EXCELLENT`: Score ≥ 0.9
+- **Scores**: Numerical values between 0.0 and 1.0, where higher values indicate better conversion quality, for internal use only
+- **Grades**: Categorical quality assessments based on score thresholds, used to assess the overall conversion confidence:
+  - `POOR`
+  - `FAIR`
+  - `GOOD`
+  - `EXCELLENT`

-### Types of scores
+### Types of confidence calculated

-Each confidence report includes four component scores:
+Each confidence report includes four component scores and grades:

 - **`layout_score`**: Overall quality of text content extraction
 - **`ocr_score`**: Quality of OCR-extracted content
 - **`parse_score`**: 10th percentile score of text cells (emphasizes problem areas)
 - **`table_score`**: Table extraction quality *(not yet implemented)*

-### Summary scores
+### Summary grades

-Two aggregate scores provide overall document quality assessment:
+Two aggregate grades provide overall document quality assessment:

- **`mean_score`**: Average of the four component scores
- **`low_score`**: 5th percentile score (highlights worst-performing areas)
-
-Both summary scores include corresponding `mean_grade` and `low_grade` fields for quick quality assessment.
+- **`mean_grade`**: Average of the four component scores
+- **`low_grade`**: 5th percentile score (highlights worst-performing areas)

 ### Page-level vs document-level

-Confidence scores are calculated at two levels:
+Confidence grades are calculated at two levels:

- **Page-level**: Individual scores for each page, stored in the `pages` field
- **Document-level**: Overall scores for the entire document, calculated as averages of page-level scores and stored in fields equally named in the root [`ConfidenceReport`](h../../reference/document_converter/#docling.document_converter.ConversionResult.confidence)
-
-!!! note "Document-level scores"
-
-    For most use cases, users should safely focus on the document-level `mean_score` / `mean_grade` and `low_score` / `low_grade` fields to assess overall conversion quality.
+- **Page-level**: Individual scores and grades for each page, stored in the `pages` field
+- **Document-level**: Overall scores and grades for the entire document, calculated as averages of the page-level grades and stored in fields equally named in the root [`ConfidenceReport`](h../../reference/document_converter/#docling.document_converter.ConversionResult.confidence)

 ### Example