History log of /dokuwiki/_test/tests/Search/Collection/CollectionSearchTest.php (Results 1 – 3 of 3)
Revision Date Author Comments
# 1148921d 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.
- Add matches() predicate to Term using efficient string functions
(===, str_starts_with, str_ends_with, str_contains) instead of regex.
- Add caseInsensitive() support on CollectionSearch and Term for
metadata/title searches where indexed values preserve case.
- Remove callback support from MetadataSearch::lookupKey() — the only
real usage (case-insensitive substring) is replaced by
caseInsensitive() + wildcards.
- Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm()
for callers that need it (FulltextSearch, Indexer::lookup).
- Optimize execute() from 4 group passes to 2: scan tokens + resolve
frequencies in one pass per group, batch entity name resolution, then
populate Terms.
- Store full match detail in Term: entity → token → frequency. New
accessors getMatches(), getEntityTokens(), getEntityFrequencies()
derive different views from this single data structure.
- Term no longer used as scratch pad by CollectionSearch. Index-internal
data (token IDs, entity IDs) stays local to execute(). Terms receive
only final resolved results.
- Use title from search results in MetadataSearch::pageLookupCallBack()
instead of re-fetching via p_get_first_heading().
- Update concept.txt documentation.

show more ...


# 21fbd01b 07-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: add integrity checking to Collection architecture

Add checkIntegrity() to AbstractCollection and DirectCollection that
verifies paired indexes have matching line counts (token==frequenc

SearchIndex: add integrity checking to Collection architecture

Add checkIntegrity() to AbstractCollection and DirectCollection that
verifies paired indexes have matching line counts (token==frequency,
entity==reverse, entity==token for direct collections). Throws
IndexIntegrityException on the first inconsistency found.

Add Countable interface to AbstractIndex with count() implementations
in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and
Indexer::isIndexEmpty() to orchestrate checks across all collections.

Update infoutils.php to use the new Indexer API instead of the old
FulltextIndex/MetadataIndex classes.

Fix range(1, 0) bug in three places that produced [1, 0] instead of
an empty array when split-by-length indexes were empty.

show more ...


# 6734bb8c 07-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rewrite MetadataSearch to use Collection classes

Replace MetadataIndex usage in MetadataSearch with the new Collection/Index
architecture. This completes the read-path migration so data

SearchIndex: rewrite MetadataSearch to use Collection classes

Replace MetadataIndex usage in MetadataSearch with the new Collection/Index
architecture. This completes the read-path migration so data written by the
Collection-based Indexer is read back correctly using TupleOps tuple format.

Generalize FrequencyCollectionSearch into CollectionSearch that works with any
AbstractCollection type (Frequency, Lookup, Direct) and handles both
split-by-length and non-split index layouts transparently. DirectCollection
participates via resolveTokenFrequencies() which maps token RID = entity RID.

Key changes:
- AbstractCollection gains isSplitByLength(), resolveTokenFrequencies(),
getEntitiesWithData(), and groupToSuffix() with validation
- Index groups are now int (0 = non-split, positive = token length)
- CollectionSearch provides both addTerm()/execute() for fulltext and
lookup() for metadata-style search (exact/wildcard/callback)
- MetadataSearch delegates entirely to collection APIs
- Shared filterPages() replaces duplicated page filtering logic
- All callers updated from MetadataIndex to MetadataSearch
- Tests moved to Search namespace with full coverage for new APIs

show more ...