style: show converted page count in PaginatedPipeline debug statement (#2124)

* Show converted page count in PaginatedPipeline debug statement

* DCO Remediation Commit for Raphael Norman-Tenazas <tenazasr@gmail.com>

I, Raphael Norman-Tenazas <tenazasr@gmail.com>, hereby add my Signed-off-by to this commit: b7930bf56d

Signed-off-by: Raphael Norman-Tenazas <tenazasr@gmail.com>

* Show total progress instead of batch size

Signed-off-by: Raphael Norman-Tenazas <tenazasr@gmail.com>

---------

Signed-off-by: Raphael Norman-Tenazas <tenazasr@gmail.com>
This commit is contained in:
Raphael Norman-Tenazas
2025-08-23 06:13:20 -04:00
committed by GitHub
parent b04e205d1e
commit 6736e66bb4

View File

@@ -146,6 +146,7 @@ class PaginatedPipeline(BasePipeline): # TODO this is a bad name.
conv_res.pages.append(Page(page_no=i))
try:
total_pages_processed = 0
# Iterate batches of pages (page_batch_size) in the doc
for page_batch in chunkify(
conv_res.pages, settings.perf.page_batch_size
@@ -186,9 +187,9 @@ class PaginatedPipeline(BasePipeline): # TODO this is a bad name.
)
conv_res.status = ConversionStatus.PARTIAL_SUCCESS
break
total_pages_processed += len(page_batch)
_log.debug(
f"Finished converting page batch time={end_batch_time:.3f}"
f"Finished converting pages {total_pages_processed}/{len(conv_res.pages)} time={end_batch_time:.3f}"
)
except Exception as e: