| #
22a3672c |
| 22-Jul-2024 |
Andreas Gohr <andi@splitbrain.org> |
store embedding meta data on start
This way errors during the a new embedding run will not trigger a clear index on retry
|
| #
f93272b9 |
| 25-Jun-2024 |
Andreas Gohr <andi@splitbrain.org> |
auto codestyle cleanup
|
| #
e1251882 |
| 17-Jun-2024 |
Andreas Gohr <andi@splitbrain.org> |
init ACLs in CLI
When using the text renderer, we might execute syntax plugins that check ACLs. We don't have a user when running the CLI but we want to index everything an anonymous user would be a
init ACLs in CLI
When using the text renderer, we might execute syntax plugins that check ACLs. We don't have a user when running the CLI but we want to index everything an anonymous user would be able to see. For that the ACLs need to be loaded.
show more ...
|
| #
bae450a9 |
| 02-Apr-2024 |
Andreas Gohr <andi@splitbrain.org> |
rebuild the index when the embedding model changed
|
| #
b446155b |
| 27-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
fix info output on used models
|
| #
8c08cb3f |
| 27-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
auto style fixes
|
| #
ab1f8dde |
| 26-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
emit the INDEXER_PAGE_ADD event
This allows plugins that add data to the fulltext index to add the same data to the embeddings. This improves embedding searches with struct data for example.
|
| #
0de7e020 |
| 25-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
mechanisms to override things on command line
This should help with debugging/evaluating
|
| #
c2b7a1f7 |
| 21-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
various refactoring and introduction of a simulate command
The new command makes it easier to run the same chat questions against multiple models and compare the results in a spreadsheet
|
| #
51aa8517 |
| 20-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
separate the rephrasing model from the chat model
Rephrasing can be done with faster, simpler models as there is not much reasoning needed.
|
| #
99b713bf |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
fix info output
|
| #
2045e15a |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
fix ouput prices
|
| #
87e46484 |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
added Voyage AI for embeddings
|
| #
e8451b21 |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
added model command to CLI
This prints info about the available models
|
| #
34a1c478 |
| 19-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
more refactoring on chat and embed model support
* differentiate between input and output tokens * make use of much larger input contexts
|
| #
6a18e0f4 |
| 14-Mar-2024 |
Andreas Gohr <andi@splitbrain.org> |
First start on refactoring the class hierarchy
This splits embedding models from chat completion models.
|
| #
e75dc39f |
| 14-Feb-2024 |
Andreas Gohr <andi@splitbrain.org> |
record the times of embed and maintenance runs
This makes it easier to debug when something with the cronjob goes wrong. Currently the data is only exposed in the cli info command.
We might want to
record the times of embed and maintenance runs
This makes it easier to debug when something with the cronjob goes wrong. Currently the data is only exposed in the cli info command.
We might want to use it somewhere in the UI to warn about outdated data
show more ...
|
| #
49a7d3cc |
| 29-Jan-2024 |
splitbrain <splitbrain@users.noreply.github.com> |
Automatic code style fixes
|
| #
d5c102b3 |
| 29-Jan-2024 |
Andreas Gohr <andi@splitbrain.org> |
Regular expressions to limit the indexed pages. Implements #5
Both regular expressions (when set) need to apply at the same time. Eg a page MUST match the matchRegex and MUST NOT match the skipRegex
Regular expressions to limit the indexed pages. Implements #5
Both regular expressions (when set) need to apply at the same time. Eg a page MUST match the matchRegex and MUST NOT match the skipRegex to be applied.
The regular expressions are applied when running the `embed` command line command. Pages no longer adhering to a changed regex setup will be removed from the vector store.
For the sqlite storage it is recommended to re-cluster the index when the reges are changed by running the `maintenance` command.
show more ...
|
| #
dc355d57 |
| 06-Nov-2023 |
Andreas Gohr <andi@splitbrain.org> |
added chunk dumping to CLI page command
|
| #
7ebc7895 |
| 13-Sep-2023 |
splitbrain <splitbrain@users.noreply.github.com> |
Automatic code style fixes
|
| #
e33a1d7a |
| 28-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
optionally search one language only
|
| #
8c8b7ba6 |
| 16-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
Added dumping of TSV files to SQLite store
This allows visualizing the embed vectors
|
| #
8285fff9 |
| 15-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
Merge branch 'pineconestorage'
* pineconestorage: implement Pinecone based storage First go at syntax to display similar pages
|
| #
3379af09 |
| 15-Aug-2023 |
Andreas Gohr <andi@splitbrain.org> |
use a k-means based cluster approach to speed up similarity searches
|