feat: Use threading in the standard pipeline and move old behavior to legacy (#2452)

* rename standard to legacy

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* remove old standard pipeline

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* move threaded to standard

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add backwards compatible threaded pipeline

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* Updates for threaded pipeline to lower memory requirements

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* updating deps seem to remove the corrupted double-linked list error

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update pinning

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* use main lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add more threadsafe blocks

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename batch_timeout_seconds

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Co-authored-by: Christoph Auer <cau@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2025-10-31 14:42:11 +01:00
committed by GitHub
parent 01577e92d1
commit 268d027c8f
7 changed files with 851 additions and 752 deletions

View File

@@ -42,7 +42,7 @@ def test_threaded_pipeline_multiple_documents():
layout_batch_size=1,
table_batch_size=1,
ocr_batch_size=1,
batch_timeout_seconds=1.0,
batch_polling_interval_seconds=1.0,
do_table_structure=do_ts,
do_ocr=do_ocr,
),