| b188a75b | 10-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: fix IntegrityTest not re-indexing between tests
The .indexed metadata tag persisted between test methods, causing needsIndexing() to skip re-indexing when saveWikiText() didn't update t
SearchIndex: fix IntegrityTest not re-indexing between tests
The .indexed metadata tag persisted between test methods, causing needsIndexing() to skip re-indexing when saveWikiText() didn't update the wiki file (identical content). Clean the tag in setUp.
show more ...
|
| 1148921d | 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline.
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation.
show more ...
|
| 5e9d26e3 | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: move search() function tests back to tests/inc/search/
The search.test.php file tests the search() function from inc/search.php, not the Search namespace classes. It was incorrectly mov
SearchIndex: move search() function tests back to tests/inc/search/
The search.test.php file tests the search() function from inc/search.php, not the Search namespace classes. It was incorrectly moved into tests/Search/ during the test suite reorganization. Move it and its data files (ns1/, ns2/) back to their original location, keeping only searchtest.txt in tests/Search/data/ where it belongs.
show more ...
|
| e1272c08 | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: add backward compatibility wrappers
Add deprecated wrappers for idx_* and ft_* functions that were removed when inc/indexer.php and inc/fulltext.php were replaced by the new Search clas
SearchIndex: add backward compatibility wrappers
Add deprecated wrappers for idx_* and ft_* functions that were removed when inc/indexer.php and inc/fulltext.php were replaced by the new Search classes. These wrappers delegate to the new architecture and ensure existing plugins continue to work.
Deprecated standalone functions: idx_get_indexer, idx_getIndex, idx_lookup, idx_listIndexLengths, idx_indexLengths, ft_pageSearch, ft_backlinks, ft_mediause, ft_pageLookup, ft_snippet, ft_pagesorter, ft_snippet_re_preprocess, ft_queryParser.
Deprecated methods on Indexer: lookupKey, getPages, addMetaKeys, renameMetaValue, getPID, lookup.
Also migrates remaining core callers (Ajax, FeedCreator, ApiCore) to use the new classes directly and fixes a UTF-8 case folding bug in MetadataSearch title lookups.
show more ...
|
| 21fbd01b | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: add integrity checking to Collection architecture
Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequenc
SearchIndex: add integrity checking to Collection architecture
Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequency, entity==reverse, entity==token for direct collections). Throws IndexIntegrityException on the first inconsistency found.
Add Countable interface to AbstractIndex with count() implementations in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and Indexer::isIndexEmpty() to orchestrate checks across all collections.
Update infoutils.php to use the new Indexer API instead of the old FulltextIndex/MetadataIndex classes.
Fix range(1, 0) bug in three places that produced [1, 0] instead of an empty array when split-by-length indexes were empty.
show more ...
|
| 6734bb8c | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: rewrite MetadataSearch to use Collection classes
Replace MetadataIndex usage in MetadataSearch with the new Collection/Index architecture. This completes the read-path migration so data
SearchIndex: rewrite MetadataSearch to use Collection classes
Replace MetadataIndex usage in MetadataSearch with the new Collection/Index architecture. This completes the read-path migration so data written by the Collection-based Indexer is read back correctly using TupleOps tuple format.
Generalize FrequencyCollectionSearch into CollectionSearch that works with any AbstractCollection type (Frequency, Lookup, Direct) and handles both split-by-length and non-split index layouts transparently. DirectCollection participates via resolveTokenFrequencies() which maps token RID = entity RID.
Key changes: - AbstractCollection gains isSplitByLength(), resolveTokenFrequencies(), getEntitiesWithData(), and groupToSuffix() with validation - Index groups are now int (0 = non-split, positive = token length) - CollectionSearch provides both addTerm()/execute() for fulltext and lookup() for metadata-style search (exact/wildcard/callback) - MetadataSearch delegates entirely to collection APIs - Shared filterPages() replaces duplicated page filtering logic - All callers updated from MetadataIndex to MetadataSearch - Tests moved to Search namespace with full coverage for new APIs
show more ...
|