add multiple improvements and fixes

Add typing, switch to list comprehensions where possible,
encapsulate all methods within new chunker implementation,
use dataclass instead of unmanged dictionary,
list dependencies in setup installation line.

Fix token counting bug due to static initialization of
`semchunk.Chunker`.

Use expanded chunk typing (from -core) including
embedding-specific and gen-specific texts.

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
This commit is contained in:
Panos Vagenas 2024-11-19 23:36:50 +01:00
parent 5a8186b8fb
commit ce38baf7f7

File diff suppressed because one or more lines are too long