History log of /dokuwiki/inc/Search/Index/AbstractIndex.php (Results 1 – 12 of 12)
Revision Date Author Comments
# 9369b4a9 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rector, phpcs, type hint fixes


# 1148921d 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.
- Add matches() predicate to Term using efficient string functions
(===, str_starts_with, str_ends_with, str_contains) instead of regex.
- Add caseInsensitive() support on CollectionSearch and Term for
metadata/title searches where indexed values preserve case.
- Remove callback support from MetadataSearch::lookupKey() — the only
real usage (case-insensitive substring) is replaced by
caseInsensitive() + wildcards.
- Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm()
for callers that need it (FulltextSearch, Indexer::lookup).
- Optimize execute() from 4 group passes to 2: scan tokens + resolve
frequencies in one pass per group, batch entity name resolution, then
populate Terms.
- Store full match detail in Term: entity → token → frequency. New
accessors getMatches(), getEntityTokens(), getEntityFrequencies()
derive different views from this single data structure.
- Term no longer used as scratch pad by CollectionSearch. Index-internal
data (token IDs, entity IDs) stays local to execute(). Terms receive
only final resolved results.
- Use title from search results in MetadataSearch::pageLookupCallBack()
instead of re-fetching via p_get_first_heading().
- Update concept.txt documentation.

show more ...


# 21fbd01b 07-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: add integrity checking to Collection architecture

Add checkIntegrity() to AbstractCollection and DirectCollection that
verifies paired indexes have matching line counts (token==frequenc

SearchIndex: add integrity checking to Collection architecture

Add checkIntegrity() to AbstractCollection and DirectCollection that
verifies paired indexes have matching line counts (token==frequency,
entity==reverse, entity==token for direct collections). Throws
IndexIntegrityException on the first inconsistency found.

Add Countable interface to AbstractIndex with count() implementations
in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and
Indexer::isIndexEmpty() to orchestrate checks across all collections.

Update infoutils.php to use the new Indexer API instead of the old
FulltextIndex/MetadataIndex classes.

Fix range(1, 0) bug in three places that produced [1, 0] instead of
an empty array when split-by-length indexes were empty.

show more ...


# 83b3accc 06-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rewrite Indexer to use Collection classes

Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex)
with the new Collection-based architecture. The Indexer is now a thin
sta

SearchIndex: rewrite Indexer to use Collection classes

Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex)
with the new Collection-based architecture. The Indexer is now a thin
stateless orchestrator that delegates all index work to collections.

Key changes:
- Indexer no longer extends AbstractIndex; page name passed to methods
- addPage/deletePage/clear use PageTitleCollection,
PageFulltextCollection, and PageMetaCollection
- New PageMetaCollection replaces separate ReferencesCollection and
MediaCollection with a single class that handles arbitrary metadata
keys dynamically
- Shared writable FileIndex('page') passed to all collections
- Logger callback replaces verbose parameter
- Methods return void instead of bool
- Index classes implement IteratorAggregate for clean data access
- Indexer tests consolidated into namespaced IndexerTest.php
- All callers updated to new stateless API

show more ...


# c66b5ec6 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rewrite Lock as static registry with reference counting

Replace the instance-based Lock class with a static registry that
tracks held locks per-process with reference counting. This sol

SearchIndex: rewrite Lock as static registry with reference counting

Replace the instance-based Lock class with a static registry that
tracks held locks per-process with reference counting. This solves
three problems:

- Split indexes (w3, w4, ...) share a single lock name and now
coordinate naturally via the registry
- Multiple callers can acquire the same lock without conflict
- Indexes enforce their own writability through lock()/unlock()
methods on AbstractIndex

The Lock registry manages both the filesystem lock (mkdir) and the
in-process tracking. The first acquire creates the directory, subsequent
acquires increment the refcount. Release decrements, and only removes
the directory when the count reaches zero.

Note: I am not sure if implementing this as a static object is a great
idea or if we should pass an instance through the collection to the
indexes...

show more ...


# 596d5287 11-May-2023 Andreas Gohr <andi@splitbrain.org>

Working fulltext collection and search

This finalizes the FulltextCollection and FulltextCollectionSearch
classes. Proper locking is implemented, tests have been enhanced.

It should be possible to

Working fulltext collection and search

This finalizes the FulltextCollection and FulltextCollectionSearch
classes. Proper locking is implemented, tests have been enhanced.

It should be possible to reimplement the page full text search on top of
it.

show more ...


# 7fcedc39 09-May-2023 Andreas Gohr <andi@splitbrain.org>

Indexes can now be opened in readonly mode


# 9f63f003 08-May-2023 Andreas Gohr <andi@splitbrain.org>

add method to retrieve multiple rows at once


# 03a35633 12-Sep-2022 Andreas Gohr <andi@splitbrain.org>

added method to search an index by regular expression


# 8ed35011 08-Dec-2021 Andreas Gohr <andi@splitbrain.org>

better method names


# d6396b6d 08-Dec-2021 Andreas Gohr <andi@splitbrain.org>

we need the same access methods in both index types


# ec5280ef 04-Dec-2021 Andreas Gohr <andi@splitbrain.org>

rearranging the Index class structure

This is a first step at stuff at restructuring the indexing classes a
bit more.

Some background:

We have basically two different kind of index files:

a) RowI

rearranging the Index class structure

This is a first step at stuff at restructuring the indexing classes a
bit more.

Some background:

We have basically two different kind of index files:

a) RowIndex (like page.idx)

Each line in the index contains a single value. The line number is used
as primary ID. These files can be very large. Thus an index like that
should never be read into memory completely if it can be avoided.

b) TupleIndex (like i12.idx)

Each line contains a list of tuples. The files tend to be smaller so
loading them completely for search and replace is easier.

Since the the access is so completely different, I tried to model that
in the two different classes, basically moving the methods from
\dokuwiki\Search\AbstractIndex to the new classes.

While doing so, I tried to make the doc blocks, variable names and
interface easier to understand. I also added tests for each of the
methods.

The old code has not been touched yet. So these classes do not do
anything outside of tests currently.

show more ...