| #
31a78876 |
| 12-Mar-2025 |
Andreas Gohr <andi@splitbrain.org> |
sqlite: avoid warnings on too short vectors
This should not happen in the real world. But when embeddings were created with a shorter vector model than the model that is used to embed the query, the
sqlite: avoid warnings on too short vectors
This should not happen in the real world. But when embeddings were created with a shorter vector model than the model that is used to embed the query, the cosineSimilarity method threw a whole bunch of warnings. We now stop the comparison at the vector length.
In the real world the same model for embeddings and the query should be used, results are unpredictable otherwise. So this is mostly a cosmetic change for messed up states during development.
show more ...
|
| #
42b2c6e8 |
| 12-Mar-2025 |
Andreas Gohr <andi@splitbrain.org> |
add remote component to ask questions to the bot
The endpoint allows to override model and language settings on demand.
|
| #
8c08cb3f |
| 27-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
auto style fixes
|
| #
ab1f8dde |
| 26-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
emit the INDEXER_PAGE_ADD event
This allows plugins that add data to the fulltext index to add the same data to the embeddings. This improves embedding searches with struct data for example.
|
| #
720bb43f |
| 25-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
make threshold configurable
|
| #
04afb84f |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
correctly use storage setting
|
| #
34a1c478 |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
more refactoring on chat and embed model support
* differentiate between input and output tokens * make use of much larger input contexts
|
| #
441edf84 |
| 08-Nov-2023 |
Andreas Gohr <andi@splitbrain.org> |
fixed overlong lines
|
| #
30b9cbc7 |
| 08-Nov-2023 |
splitbrain <splitbrain@users.noreply.github.com> |
Automatic code style fixes
|
| #
f8d5ae01 |
| 13-Sep-2023 |
Andreas Gohr <andi@splitbrain.org> |
codesniffer cleanups
|
| #
7ebc7895 |
| 13-Sep-2023 |
splitbrain <splitbrain@users.noreply.github.com> |
Automatic code style fixes
|
| #
adfc5429 |
| 29-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
generate clusters only if more than 3 clusters would be created
|
| #
e33a1d7a |
| 28-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
optionally search one language only
|
| #
8c8b7ba6 |
| 16-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
Added dumping of TSV files to SQLite store
This allows visualizing the embed vectors
|
| #
8285fff9 |
| 15-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
Merge branch 'pineconestorage'
* pineconestorage: implement Pinecone based storage First go at syntax to display similar pages
|
| #
3379af09 |
| 15-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
use a k-means based cluster approach to speed up similarity searches
|
| #
35555bac |
| 15-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
simplify cosine distance calculation
Since all OpenAI vectors are normalized, only the dotproduct needs to be calculated for the distance. This saves a couple of floating point ops per chunk, but do
simplify cosine distance calculation
Since all OpenAI vectors are normalized, only the dotproduct needs to be calculated for the distance. This saves a couple of floating point ops per chunk, but doesn't make a huge difference overall.
show more ...
|
| #
01f06932 |
| 10-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
First go at syntax to display similar pages
|
| #
68b6fa79 |
| 10-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
First go at syntax to display similar pages
|
| #
81b450c8 |
| 14-Jun-2023 |
Andreas Gohr <andi@splitbrain.org> |
use a cut-off point when considering similar documents
|
| #
9b3d1b36 |
| 14-Jun-2023 |
Andreas Gohr <andi@splitbrain.org> |
show similarity scores in CLI
|
| #
f6ef2e50 |
| 14-Jun-2023 |
Andreas Gohr <andi@splitbrain.org> |
refactoring to make models selectable
This makes it much easier to add new models. Models can now be selected via the configuration
|