Embeddings.php - OpenGrok history log for /plugin/aichat/Embeddings.php

Revision	Date	Author	Comments
# 80dbccf3	18-Mar-2026	Anna Dabrowska <dabrowska@cosmocode.de>	Fix chunksize calculation
# bdf0ac54	17-Mar-2026	Andreas Gohr <gohr@cosmocode.de>	make fullpagecontext a numeric setting You now can define how many pages should be sent, regardless of the number of matching chunks. Pages are still identified via the chunks, so this number can't make fullpagecontext a numeric setting You now can define how many pages should be sent, regardless of the number of matching chunks. Pages are still identified via the chunks, so this number can't be larger than the chunks. show more ...
# 2d02fff5	19-Jan-2026	Andreas Gohr <gohr@cosmocode.de>	avoid deleting non-existant chunks. fixes #46
# ae2d01b1	06-Oct-2025	Andreas Gohr <andi@splitbrain.org>	Merge branch 'sentencesplit' into partner * sentencesplit: add tests for text splitting make overlap a class member for easier testing Agents: make clearer how to run tests added an AGENTS.m Merge branch 'sentencesplit' into partner * sentencesplit: add tests for text splitting make overlap a class member for easier testing Agents: make clearer how to run tests added an AGENTS.md file for LLM based work split sentences by token, not bytes. handle UTF-8 move text splitting into it's own class Some enhancements on the subsentence splitting Squashed commit of the following: show more ...
# 072e0099	06-Oct-2025	Andreas Gohr <gohr@cosmocode.de>	move text splitting into it's own class
# 3daef465	06-Oct-2025	Andreas Gohr <gohr@cosmocode.de>	Some enhancements on the subsentence splitting When a sentence is longer than a chunk, it should be split forcefully in smaller parts - these parts should NOT be the size of a full chunk since we st Some enhancements on the subsentence splitting When a sentence is longer than a chunk, it should be split forcefully in smaller parts - these parts should NOT be the size of a full chunk since we still want to do some overlap with previous and following texts. I chose to split into a quarter of a chunk. This also ensures that whitespace is kept for the split sentences, because they may be joined with follow up texts. show more ...
# 867b7752	06-Oct-2025	Henry <henry.krupp@gmail.com>	Squashed commit of the following: commit 4e0adf2a8d810e55db6d37ccc87c76d95ddcfd8d Author: Henry <henry.krupp@gmail.com> Date: Mon Feb 3 22:04:17 2025 +0100 Updated splitLongSentence() commit Squashed commit of the following: commit 4e0adf2a8d810e55db6d37ccc87c76d95ddcfd8d Author: Henry <henry.krupp@gmail.com> Date: Mon Feb 3 22:04:17 2025 +0100 Updated splitLongSentence() commit 9883844f1db6df9e11051c4c7b034e68baaca0be Author: Henry <henry.krupp@gmail.com> Date: Mon Feb 3 22:03:25 2025 +0100 Updated splitLongSentence() commit 6f737f6fe4da25fa438211d5c00605c2df9c81ba Author: Henry <henry.krupp@gmail.com> Date: Mon Feb 3 21:43:16 2025 +0100 array_unshift($sentences, ...$this->splitLongSentence($sentence, $tiktok)); commit 21966eab02f87f632e82ad0055f9bc2aadb92053 Author: Henry <henry.krupp@gmail.com> Date: Mon Feb 3 21:23:40 2025 +0100 Updated splitIntoChunks method Push split sentences to the front of the queue with array_unshift($sentences, ...$this->splitLongSentence($sentence, $tiktok)); show more ...
# 9634d734	21-May-2025	Andreas Gohr <gohr@cosmocode.de>	add option to always send full page context
# 7be8078e	15-Apr-2025	Andreas Gohr <andi@splitbrain.org>	allow models to have a zero token limit This allows for configuring completely unknown models. For these models no token limit is known and we will simply do not apply any. Instead we trust that the allow models to have a zero token limit This allows for configuring completely unknown models. For these models no token limit is known and we will simply do not apply any. Instead we trust that the model will be either large enough to handle our input or at least throw useful error messages. show more ...
# ed47fd87	27-Mar-2025	Andreas Gohr <andi@splitbrain.org>	new UI with option to chat about the current page
# aa6bbe75	12-Mar-2025	Andreas Gohr <andi@splitbrain.org>	added "similar" endpoint to the remote api
# c2f55081	22-Jul-2024	Andreas Gohr <andi@splitbrain.org>	show used query when doing similarity queries
# 661701ee	25-Jun-2024	Andreas Gohr <andi@splitbrain.org>	Use custom renderer when creating embeddings Rendering makes plugin output available and and handles includes. It might also help with #15. The renderer uses markdown like output since all LLMs seem Use custom renderer when creating embeddings Rendering makes plugin output available and and handles includes. It might also help with #15. The renderer uses markdown like output since all LLMs seem to be very familiar with it's syntax. This might help them to understand the document structure better. This also adds a breadcrumb trail at the top of each chunk which might help with contextulization as well. show more ...
# 303d0c59	17-Jun-2024	Andreas Gohr <andi@splitbrain.org>	gracefully handle render errors plugins may act up during text rendering, this should not abort the whole indexing. Instead we fall back to the page source
# 8c08cb3f	27-Mar-2024	Andreas Gohr <andi@splitbrain.org>	auto style fixes
# ab1f8dde	26-Mar-2024	Andreas Gohr <andi@splitbrain.org>	emit the INDEXER_PAGE_ADD event This allows plugins that add data to the fulltext index to add the same data to the embeddings. This improves embedding searches with struct data for example.
# 720bb43f	25-Mar-2024	Andreas Gohr <andi@splitbrain.org>	make threshold configurable
# 2071dced	21-Mar-2024	Andreas Gohr <andi@splitbrain.org>	automatic stylefixes
# 5f71c9bb	21-Mar-2024	Andreas Gohr <andi@splitbrain.org>	small adjustments
# c2b7a1f7	21-Mar-2024	Andreas Gohr <andi@splitbrain.org>	various refactoring and introduction of a simulate command The new command makes it easier to run the same chat questions against multiple models and compare the results in a spreadsheet
# ecb0a423	19-Mar-2024	Andreas Gohr <andi@splitbrain.org>	do not hardcode dimensions in qdrant storage
# e3640be8	19-Mar-2024	Andreas Gohr <andi@splitbrain.org>	clean up of the config options Emojis are used to make the different options easier to distinguish
# 34a1c478	19-Mar-2024	Andreas Gohr <andi@splitbrain.org>	more refactoring on chat and embed model support * differentiate between input and output tokens * make use of much larger input contexts
# 294a9eaf	18-Mar-2024	Andreas Gohr <andi@splitbrain.org>	Use interfaces for Chat and Embedding classes This way it's easier to have a base OpenAI class. This also moves much of the statistics and http handling into the base class making model implementati Use interfaces for Chat and Embedding classes This way it's easier to have a base OpenAI class. This also moves much of the statistics and http handling into the base class making model implementations even leaner show more ...
# 6a18e0f4	14-Mar-2024	Andreas Gohr <andi@splitbrain.org>	First start on refactoring the class hierarchy This splits embedding models from chat completion models.
12