History log of /plugin/aichat/Embeddings.php (Results 26 – 50 of 50)
Revision Date Author Comments
# d5c102b3 29-Jan-2024 Andreas Gohr <andi@splitbrain.org>

Regular expressions to limit the indexed pages. Implements #5

Both regular expressions (when set) need to apply at the same time. Eg a
page MUST match the matchRegex and MUST NOT match the skipRegex

Regular expressions to limit the indexed pages. Implements #5

Both regular expressions (when set) need to apply at the same time. Eg a
page MUST match the matchRegex and MUST NOT match the skipRegex to be
applied.

The regular expressions are applied when running the `embed` command
line command. Pages no longer adhering to a changed regex setup will be
removed from the vector store.

For the sqlite storage it is recommended to re-cluster the index when
the reges are changed by running the `maintenance` command.

show more ...


# 30b9cbc7 08-Nov-2023 splitbrain <splitbrain@users.noreply.github.com>

�� Automatic code style fixes


# f8d5ae01 13-Sep-2023 Andreas Gohr <andi@splitbrain.org>

codesniffer cleanups


# 7ebc7895 13-Sep-2023 splitbrain <splitbrain@users.noreply.github.com>

�� Automatic code style fixes


# e33a1d7a 28-Aug-2023 Andreas Gohr <andi@splitbrain.org>

optionally search one language only


# aee9b383 14-Aug-2023 Andreas Gohr <andi@splitbrain.org>

output info on similar chunk fetching

helps with figuring out how fast the store is


# f6ef2e50 14-Jun-2023 Andreas Gohr <andi@splitbrain.org>

refactoring to make models selectable

This makes it much easier to add new models. Models can now be selected
via the configuration


# 68908844 14-Jun-2023 Andreas Gohr <andi@splitbrain.org>

Use overlapping chunks, prepare for new models


# 614f8ab4 13-Jun-2023 Andreas Gohr <andi@splitbrain.org>

removed K-D Tree vector storage

Since this implementation does not have any advantages over the SQLite
storage it makes no sense to keep it. Other storage backends might come
though at a later time


# 93c1dbf4 13-Jun-2023 Andreas Gohr <andi@splitbrain.org>

avoid empty chunks


# 88305719 12-Jun-2023 Andreas Gohr <andi@splitbrain.org>

optionally use text renderer for chunking


# 74d69006 11-Jun-2023 Andreas Gohr <andi@splitbrain.org>

added a few todos


# 4e206c13 11-Jun-2023 Andreas Gohr <andi@splitbrain.org>

do not use very small pages as source

These are usually not helpful as context, but may throw off the nearest
neighbor search.


# 5284515d 11-Jun-2023 Andreas Gohr <andi@splitbrain.org>

add index clearing, cleanup


# 33128f96 11-Jun-2023 Andreas Gohr <andi@splitbrain.org>

fix chunk access by ID in sqlite backend


# 7ee8b02d 10-Jun-2023 Andreas Gohr <andi@splitbrain.org>

firs go at abrstracting the storage backend and using sqlite

Requires the dev branch of the sqlite plugin


# 5786be46 10-Jun-2023 Andreas Gohr <andi@splitbrain.org>

added info method for kd tree inspection


# 5aa45b4d 09-Jun-2023 Andreas Gohr <andi@splitbrain.org>

Faster (and cheaper) reindexing

We can reuse the previous data if the page hasn't changed


# ad38c5fd 09-Jun-2023 Andreas Gohr <andi@splitbrain.org>

More robustness in creating the index

Retry failed API connections, do not abort the whole indexing and rather
skip a chunk. CLI testing tool for the split mechanism


# 6f9744f7 09-Jun-2023 Andreas Gohr <andi@splitbrain.org>

don't index hidden pages


# 2ecc089a 09-Jun-2023 Andreas Gohr <andi@splitbrain.org>

fix logging in CLI mode


# 9e81bea7 09-Jun-2023 Andreas Gohr <andi@splitbrain.org>

Check ACLs before using sources


# 9da5f0df 09-Jun-2023 Andreas Gohr <andi@splitbrain.org>

some more doc blocks


# c4584168 08-Jun-2023 Andreas Gohr <andi@splitbrain.org>

support chatting


# 8817535b 08-Jun-2023 Andreas Gohr <andi@splitbrain.org>

initial checkin. working vector storage and similarity search


12