Commit Graph

2 Commits

Author SHA1 Message Date
Panos Vagenas
ce38baf7f7 add multiple improvements and fixes
Add typing, switch to list comprehensions where possible,
encapsulate all methods within new chunker implementation,
use dataclass instead of unmanged dictionary,
list dependencies in setup installation line.

Fix token counting bug due to static initialization of
`semchunk.Chunker`.

Use expanded chunk typing (from -core) including
embedding-specific and gen-specific texts.

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2024-11-19 23:36:50 +01:00
Bill Murdock
5a8186b8fb
Sample chunking notebook that includes merging, etc. (#193)
Signed-off-by: Bill Murdock <bmurdock@redhat.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Co-authored-by: Peter Staar <taa@zurich.ibm.com>
2024-11-19 23:12:04 +01:00