SQLiteStorage.php - OpenGrok history log for /plugin/aichat/Storage/SQLiteStorage.php

Revision	Date	Author	Comments
# 31a78876	12-Mar-2025	Andreas Gohr <andi@splitbrain.org>	sqlite: avoid warnings on too short vectors This should not happen in the real world. But when embeddings were created with a shorter vector model than the model that is used to embed the query, the sqlite: avoid warnings on too short vectors This should not happen in the real world. But when embeddings were created with a shorter vector model than the model that is used to embed the query, the cosineSimilarity method threw a whole bunch of warnings. We now stop the comparison at the vector length. In the real world the same model for embeddings and the query should be used, results are unpredictable otherwise. So this is mostly a cosmetic change for messed up states during development. show more ...
# 42b2c6e8	12-Mar-2025	Andreas Gohr <andi@splitbrain.org>	add remote component to ask questions to the bot The endpoint allows to override model and language settings on demand.
# 8c08cb3f	27-Mar-2024	Andreas Gohr <andi@splitbrain.org>	auto style fixes
# ab1f8dde	26-Mar-2024	Andreas Gohr <andi@splitbrain.org>	emit the INDEXER_PAGE_ADD event This allows plugins that add data to the fulltext index to add the same data to the embeddings. This improves embedding searches with struct data for example.
# 720bb43f	25-Mar-2024	Andreas Gohr <andi@splitbrain.org>	make threshold configurable
# 04afb84f	19-Mar-2024	Andreas Gohr <andi@splitbrain.org>	correctly use storage setting
# 34a1c478	19-Mar-2024	Andreas Gohr <andi@splitbrain.org>	more refactoring on chat and embed model support * differentiate between input and output tokens * make use of much larger input contexts
# 441edf84	08-Nov-2023	Andreas Gohr <andi@splitbrain.org>	fixed overlong lines
# 30b9cbc7	08-Nov-2023	splitbrain <splitbrain@users.noreply.github.com>	�� Automatic code style fixes
# f8d5ae01	13-Sep-2023	Andreas Gohr <andi@splitbrain.org>	codesniffer cleanups
# 7ebc7895	13-Sep-2023	splitbrain <splitbrain@users.noreply.github.com>	�� Automatic code style fixes
# adfc5429	29-Aug-2023	Andreas Gohr <andi@splitbrain.org>	generate clusters only if more than 3 clusters would be created
# e33a1d7a	28-Aug-2023	Andreas Gohr <andi@splitbrain.org>	optionally search one language only
# 8c8b7ba6	16-Aug-2023	Andreas Gohr <andi@splitbrain.org>	Added dumping of TSV files to SQLite store This allows visualizing the embed vectors
# 8285fff9	15-Aug-2023	Andreas Gohr <andi@splitbrain.org>	Merge branch 'pineconestorage' * pineconestorage: implement Pinecone based storage First go at syntax to display similar pages
# 3379af09	15-Aug-2023	Andreas Gohr <andi@splitbrain.org>	use a k-means based cluster approach to speed up similarity searches
# 35555bac	15-Aug-2023	Andreas Gohr <andi@splitbrain.org>	simplify cosine distance calculation Since all OpenAI vectors are normalized, only the dotproduct needs to be calculated for the distance. This saves a couple of floating point ops per chunk, but do simplify cosine distance calculation Since all OpenAI vectors are normalized, only the dotproduct needs to be calculated for the distance. This saves a couple of floating point ops per chunk, but doesn't make a huge difference overall. show more ...
# 01f06932	10-Aug-2023	Andreas Gohr <andi@splitbrain.org>	First go at syntax to display similar pages
# 68b6fa79	10-Aug-2023	Andreas Gohr <andi@splitbrain.org>	First go at syntax to display similar pages
# 81b450c8	14-Jun-2023	Andreas Gohr <andi@splitbrain.org>	use a cut-off point when considering similar documents
# 9b3d1b36	14-Jun-2023	Andreas Gohr <andi@splitbrain.org>	show similarity scores in CLI
# f6ef2e50	14-Jun-2023	Andreas Gohr <andi@splitbrain.org>	refactoring to make models selectable This makes it much easier to add new models. Models can now be selected via the configuration