History log of /dokuwiki/inc/Search/Collection/AbstractCollection.php (Results 1 – 9 of 9)
Revision Date Author Comments
# 9369b4a9 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rector, phpcs, type hint fixes


# 21fbd01b 07-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: add integrity checking to Collection architecture

Add checkIntegrity() to AbstractCollection and DirectCollection that
verifies paired indexes have matching line counts (token==frequenc

SearchIndex: add integrity checking to Collection architecture

Add checkIntegrity() to AbstractCollection and DirectCollection that
verifies paired indexes have matching line counts (token==frequency,
entity==reverse, entity==token for direct collections). Throws
IndexIntegrityException on the first inconsistency found.

Add Countable interface to AbstractIndex with count() implementations
in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and
Indexer::isIndexEmpty() to orchestrate checks across all collections.

Update infoutils.php to use the new Indexer API instead of the old
FulltextIndex/MetadataIndex classes.

Fix range(1, 0) bug in three places that produced [1, 0] instead of
an empty array when split-by-length indexes were empty.

show more ...


# 6734bb8c 07-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rewrite MetadataSearch to use Collection classes

Replace MetadataIndex usage in MetadataSearch with the new Collection/Index
architecture. This completes the read-path migration so data

SearchIndex: rewrite MetadataSearch to use Collection classes

Replace MetadataIndex usage in MetadataSearch with the new Collection/Index
architecture. This completes the read-path migration so data written by the
Collection-based Indexer is read back correctly using TupleOps tuple format.

Generalize FrequencyCollectionSearch into CollectionSearch that works with any
AbstractCollection type (Frequency, Lookup, Direct) and handles both
split-by-length and non-split index layouts transparently. DirectCollection
participates via resolveTokenFrequencies() which maps token RID = entity RID.

Key changes:
- AbstractCollection gains isSplitByLength(), resolveTokenFrequencies(),
getEntitiesWithData(), and groupToSuffix() with validation
- Index groups are now int (0 = non-split, positive = token length)
- CollectionSearch provides both addTerm()/execute() for fulltext and
lookup() for metadata-style search (exact/wildcard/callback)
- MetadataSearch delegates entirely to collection APIs
- Shared filterPages() replaces duplicated page filtering logic
- All callers updated from MetadataIndex to MetadataSearch
- Tests moved to Search namespace with full coverage for new APIs

show more ...


# 83b3accc 06-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rewrite Indexer to use Collection classes

Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex)
with the new Collection-based architecture. The Indexer is now a thin
sta

SearchIndex: rewrite Indexer to use Collection classes

Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex)
with the new Collection-based architecture. The Indexer is now a thin
stateless orchestrator that delegates all index work to collections.

Key changes:
- Indexer no longer extends AbstractIndex; page name passed to methods
- addPage/deletePage/clear use PageTitleCollection,
PageFulltextCollection, and PageMetaCollection
- New PageMetaCollection replaces separate ReferencesCollection and
MediaCollection with a single class that handles arbitrary metadata
keys dynamically
- Shared writable FileIndex('page') passed to all collections
- Logger callback replaces verbose parameter
- Methods return void instead of bool
- Index classes implement IteratorAggregate for clean data access
- Indexer tests consolidated into namespaced IndexerTest.php
- All callers updated to new stateless API

show more ...


# 95b16223 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: accept pre-instantiated entity and token indexes in collections

Allow passing AbstractIndex objects for the entity and token
parameters instead of string names. This enables sharing ind

SearchIndex: accept pre-instantiated entity and token indexes in collections

Allow passing AbstractIndex objects for the entity and token
parameters instead of string names. This enables sharing index
instances between collections for efficiency.

show more ...


# c66b5ec6 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rewrite Lock as static registry with reference counting

Replace the instance-based Lock class with a static registry that
tracks held locks per-process with reference counting. This sol

SearchIndex: rewrite Lock as static registry with reference counting

Replace the instance-based Lock class with a static registry that
tracks held locks per-process with reference counting. This solves
three problems:

- Split indexes (w3, w4, ...) share a single lock name and now
coordinate naturally via the registry
- Multiple callers can acquire the same lock without conflict
- Indexes enforce their own writability through lock()/unlock()
methods on AbstractIndex

The Lock registry manages both the filesystem lock (mkdir) and the
in-process tracking. The first acquire creates the directory, subsequent
acquires increment the refcount. Release decrements, and only removes
the directory when the count reaches zero.

Note: I am not sure if implementing this as a static object is a great
idea or if we should pass an instance through the collection to the
indexes...

show more ...


# 0a9fafed 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: fix lock() releasing foreign locks on partial failure

Track successfully acquired locks in $lockedIndexes so that
unlock() only releases locks this collection actually holds.
Previously

SearchIndex: fix lock() releasing foreign locks on partial failure

Track successfully acquired locks in $lockedIndexes so that
unlock() only releases locks this collection actually holds.
Previously, a failed lock acquisition would call unlock() which
released all index locks including ones never acquired, potentially
releasing locks held by other processes.

show more ...


# d92c078c 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: add DirectCollection for 1:1 entity-token mappings

Introduce DirectCollection as a third collection type alongside
FrequencyCollection and LookupCollection. Direct collections store
exa

SearchIndex: add DirectCollection for 1:1 entity-token mappings

Introduce DirectCollection as a third collection type alongside
FrequencyCollection and LookupCollection. Direct collections store
exactly one token per entity at the entity's position in the token
index (entity.RID === token.RID), with no frequency or reverse indexes.

AbstractCollection now accepts optional frequency/reverse index names
(default to '') and skips locking empty index names.

Adds PageTitleCollection as the first concrete direct collection
for the page -> title mapping.

show more ...


# f2bbffb5 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: extract Collection base class hierarchy

Introduce AbstractCollection as the shared base for all index
collections, with FrequencyCollection and LookupCollection as
the two abstract subc

SearchIndex: extract Collection base class hierarchy

Introduce AbstractCollection as the shared base for all index
collections, with FrequencyCollection and LookupCollection as
the two abstract subclasses differing only in how tokens are
counted (frequency vs dedup).

Key design decisions:
- splitByLength is a constructor parameter on AbstractCollection
controlling whether token/frequency indexes use length-based
file splitting. This is independent of the collection type.
- The reverse index format is self-describing: entries with *
have a group prefix (split), entries without don't (non-split).
No branching needed in parse/format methods.
- addEntity, resolveTokens, updateIndexes, and reverse index
handling all live in AbstractCollection. Subclasses only
implement countTokens().

Concrete collections: PageFulltextCollection (frequency, split),
MediaCollection and ReferencesCollection (lookup, non-split).

Renames FulltextCollection -> PageFulltextCollection and
FulltextCollectionSearch -> FrequencyCollectionSearch.

show more ...