* Experimental code for repetition detection, VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM Streaming
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Update VLLM inference code, CLI and VLM specs
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix generation and decoder args for HF model
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix vllm device args
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Cleanup
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Bugfixes
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Remove streaming VLLM for the moment
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add repetition StoppingCriteria for GraniteDocling/SmolDocling
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Make GenerationStopper base class and port for MLX
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Add streaming support and custom GenerationStopper support for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fixes for ApiVlmModel
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Fix api_image_request_streaming when GenerationStopper triggers.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
* Move DocTagsRepetitionStopper to utility unit, update examples
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
---------
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>