History log of /plugin/aichat/Embeddings.php (Results 1 – 25 of 48)
Revision Date Author Comments
# 2d02fff5 19-Jan-2026 Andreas Gohr <gohr@cosmocode.de>

avoid deleting non-existant chunks. fixes #46


# ae2d01b1 06-Oct-2025 Andreas Gohr <andi@splitbrain.org>

Merge branch 'sentencesplit' into partner

* sentencesplit:
add tests for text splitting
make overlap a class member for easier testing
Agents: make clearer how to run tests
added an AGENTS.m

Merge branch 'sentencesplit' into partner

* sentencesplit:
add tests for text splitting
make overlap a class member for easier testing
Agents: make clearer how to run tests
added an AGENTS.md file for LLM based work
split sentences by token, not bytes. handle UTF-8
move text splitting into it's own class
Some enhancements on the subsentence splitting
Squashed commit of the following:

show more ...


# 072e0099 06-Oct-2025 Andreas Gohr <gohr@cosmocode.de>

move text splitting into it's own class


# 3daef465 06-Oct-2025 Andreas Gohr <gohr@cosmocode.de>

Some enhancements on the subsentence splitting

When a sentence is longer than a chunk, it should be split forcefully in
smaller parts - these parts should NOT be the size of a full chunk since
we st

Some enhancements on the subsentence splitting

When a sentence is longer than a chunk, it should be split forcefully in
smaller parts - these parts should NOT be the size of a full chunk since
we still want to do some overlap with previous and following texts. I
chose to split into a quarter of a chunk.

This also ensures that whitespace is kept for the split sentences,
because they may be joined with follow up texts.

show more ...


# 867b7752 06-Oct-2025 Henry <henry.krupp@gmail.com>

Squashed commit of the following:

commit 4e0adf2a8d810e55db6d37ccc87c76d95ddcfd8d
Author: Henry <henry.krupp@gmail.com>
Date: Mon Feb 3 22:04:17 2025 +0100

Updated splitLongSentence()

commit

Squashed commit of the following:

commit 4e0adf2a8d810e55db6d37ccc87c76d95ddcfd8d
Author: Henry <henry.krupp@gmail.com>
Date: Mon Feb 3 22:04:17 2025 +0100

Updated splitLongSentence()

commit 9883844f1db6df9e11051c4c7b034e68baaca0be
Author: Henry <henry.krupp@gmail.com>
Date: Mon Feb 3 22:03:25 2025 +0100

Updated splitLongSentence()

commit 6f737f6fe4da25fa438211d5c00605c2df9c81ba
Author: Henry <henry.krupp@gmail.com>
Date: Mon Feb 3 21:43:16 2025 +0100

array_unshift($sentences, ...$this->splitLongSentence($sentence, $tiktok));

commit 21966eab02f87f632e82ad0055f9bc2aadb92053
Author: Henry <henry.krupp@gmail.com>
Date: Mon Feb 3 21:23:40 2025 +0100

Updated splitIntoChunks method

Push split sentences to the front of the queue with array_unshift($sentences, ...$this->splitLongSentence($sentence, $tiktok));

show more ...


# 9634d734 21-May-2025 Andreas Gohr <gohr@cosmocode.de>

add option to always send full page context


# 7be8078e 15-Apr-2025 Andreas Gohr <andi@splitbrain.org>

allow models to have a zero token limit

This allows for configuring completely unknown models. For these models
no token limit is known and we will simply do not apply any. Instead we
trust that the

allow models to have a zero token limit

This allows for configuring completely unknown models. For these models
no token limit is known and we will simply do not apply any. Instead we
trust that the model will be either large enough to handle our input or
at least throw useful error messages.

show more ...


# ed47fd87 27-Mar-2025 Andreas Gohr <andi@splitbrain.org>

new UI with option to chat about the current page


# aa6bbe75 12-Mar-2025 Andreas Gohr <andi@splitbrain.org>

added "similar" endpoint to the remote api


# c2f55081 22-Jul-2024 Andreas Gohr <andi@splitbrain.org>

show used query when doing similarity queries


# 661701ee 25-Jun-2024 Andreas Gohr <andi@splitbrain.org>

Use custom renderer when creating embeddings

Rendering makes plugin output available and and handles includes. It
might also help with #15.
The renderer uses markdown like output since all LLMs seem

Use custom renderer when creating embeddings

Rendering makes plugin output available and and handles includes. It
might also help with #15.
The renderer uses markdown like output since all LLMs seem to be very
familiar with it's syntax. This might help them to understand the
document structure better.
This also adds a breadcrumb trail at the top of each chunk which might
help with contextulization as well.

show more ...


# 303d0c59 17-Jun-2024 Andreas Gohr <andi@splitbrain.org>

gracefully handle render errors

plugins may act up during text rendering, this should not abort the
whole indexing. Instead we fall back to the page source


# 8c08cb3f 27-Mar-2024 Andreas Gohr <andi@splitbrain.org>

auto style fixes


# ab1f8dde 26-Mar-2024 Andreas Gohr <andi@splitbrain.org>

emit the INDEXER_PAGE_ADD event

This allows plugins that add data to the fulltext index to add the same
data to the embeddings. This improves embedding searches with struct
data for example.


# 720bb43f 25-Mar-2024 Andreas Gohr <andi@splitbrain.org>

make threshold configurable


# 2071dced 21-Mar-2024 Andreas Gohr <andi@splitbrain.org>

automatic stylefixes


# 5f71c9bb 21-Mar-2024 Andreas Gohr <andi@splitbrain.org>

small adjustments


# c2b7a1f7 21-Mar-2024 Andreas Gohr <andi@splitbrain.org>

various refactoring and introduction of a simulate command

The new command makes it easier to run the same chat questions against
multiple models and compare the results in a spreadsheet


# ecb0a423 19-Mar-2024 Andreas Gohr <andi@splitbrain.org>

do not hardcode dimensions in qdrant storage


# e3640be8 19-Mar-2024 Andreas Gohr <andi@splitbrain.org>

clean up of the config options

Emojis are used to make the different options easier to distinguish


# 34a1c478 19-Mar-2024 Andreas Gohr <andi@splitbrain.org>

more refactoring on chat and embed model support

* differentiate between input and output tokens
* make use of much larger input contexts


# 294a9eaf 18-Mar-2024 Andreas Gohr <andi@splitbrain.org>

Use interfaces for Chat and Embedding classes

This way it's easier to have a base OpenAI class. This also moves much
of the statistics and http handling into the base class making model
implementati

Use interfaces for Chat and Embedding classes

This way it's easier to have a base OpenAI class. This also moves much
of the statistics and http handling into the base class making model
implementations even leaner

show more ...


# 6a18e0f4 14-Mar-2024 Andreas Gohr <andi@splitbrain.org>

First start on refactoring the class hierarchy

This splits embedding models from chat completion models.


# d5c102b3 29-Jan-2024 Andreas Gohr <andi@splitbrain.org>

Regular expressions to limit the indexed pages. Implements #5

Both regular expressions (when set) need to apply at the same time. Eg a
page MUST match the matchRegex and MUST NOT match the skipRegex

Regular expressions to limit the indexed pages. Implements #5

Both regular expressions (when set) need to apply at the same time. Eg a
page MUST match the matchRegex and MUST NOT match the skipRegex to be
applied.

The regular expressions are applied when running the `embed` command
line command. Pages no longer adhering to a changed regex setup will be
removed from the vector store.

For the sqlite storage it is recommended to re-cluster the index when
the reges are changed by running the `maintenance` command.

show more ...


# 30b9cbc7 08-Nov-2023 splitbrain <splitbrain@users.noreply.github.com>

�� Automatic code style fixes


12