| c651c34b | 17-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
BacklinksTest: give testLinksInDeletedPages its own page
testLinksInDeletedPages reused test:internallinks, the same page testInternallink already saves and indexes. Since the data dir is shared acr
BacklinksTest: give testLinksInDeletedPages its own page
testLinksInDeletedPages reused test:internallinks, the same page testInternallink already saves and indexes. Since the data dir is shared across the class, when the re-save and the earlier index land in the same second, needsIndexing() now (correctly) reports the page as up to date and addPage() skips reindexing, leaving stale link data. backlinks('test:internallink') then returned an empty array.
Use a dedicated page (test:deletedlinks) with its own link targets so the test no longer collides with testInternallink's index state.
show more ...
|
| 2cda0166 | 17-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Indexer: signal nothing-to-do via boolean return instead of void
The TaskRunner runs indexing, sitemap, digest and changelog-trim tasks in sequence and relies on each task returning false when it di
Indexer: signal nothing-to-do via boolean return instead of void
The TaskRunner runs indexing, sitemap, digest and changelog-trim tasks in sequence and relies on each task returning false when it did no work so the next one is tried. The indexer rewrite changed addPage(), deletePage() and renamePage() to return void and only abort via exceptions, breaking that contract: indexing always looked like work was done and the following tasks never ran.
Restore the boolean return on these three methods (true when work was done, false when there was nothing to do) while still using exceptions to signal errors, and propagate it through TaskRunner::runIndexer(). runIndexer() also no longer forces reindexing on every call.
The legacy compatibility layer is adjusted to match: LegacyIndexer and idx_addPage() forward the boolean, mapping SearchExceptions back to the historic error-message/false returns. LegacyIndexer::renamePage() restores the 'page is not in index' message that the move plugin expects.
Closes #4661
show more ...
|
| 79dae64d | 17-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Indexer: treat same-second save and index as up to date
needsIndexing() compared the .indexed tag mtime against the page mtime with <=, so a page that was saved and indexed within the same second wa
Indexer: treat same-second save and index as up to date
needsIndexing() compared the .indexed tag mtime against the page mtime with <=, so a page that was saved and indexed within the same second was always reported as still needing indexing. Require the page to be strictly newer than the index tag instead, so an equal mtime correctly counts as up to date.
show more ...
|
| b188a75b | 10-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: fix IntegrityTest not re-indexing between tests
The .indexed metadata tag persisted between test methods, causing needsIndexing() to skip re-indexing when saveWikiText() didn't update t
SearchIndex: fix IntegrityTest not re-indexing between tests
The .indexed metadata tag persisted between test methods, causing needsIndexing() to skip re-indexing when saveWikiText() didn't update the wiki file (identical content). Clean the tag in setUp.
show more ...
|
| 1148921d | 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline.
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation.
show more ...
|
| 5e9d26e3 | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: move search() function tests back to tests/inc/search/
The search.test.php file tests the search() function from inc/search.php, not the Search namespace classes. It was incorrectly mov
SearchIndex: move search() function tests back to tests/inc/search/
The search.test.php file tests the search() function from inc/search.php, not the Search namespace classes. It was incorrectly moved into tests/Search/ during the test suite reorganization. Move it and its data files (ns1/, ns2/) back to their original location, keeping only searchtest.txt in tests/Search/data/ where it belongs.
show more ...
|
| e1272c08 | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: add backward compatibility wrappers
Add deprecated wrappers for idx_* and ft_* functions that were removed when inc/indexer.php and inc/fulltext.php were replaced by the new Search clas
SearchIndex: add backward compatibility wrappers
Add deprecated wrappers for idx_* and ft_* functions that were removed when inc/indexer.php and inc/fulltext.php were replaced by the new Search classes. These wrappers delegate to the new architecture and ensure existing plugins continue to work.
Deprecated standalone functions: idx_get_indexer, idx_getIndex, idx_lookup, idx_listIndexLengths, idx_indexLengths, ft_pageSearch, ft_backlinks, ft_mediause, ft_pageLookup, ft_snippet, ft_pagesorter, ft_snippet_re_preprocess, ft_queryParser.
Deprecated methods on Indexer: lookupKey, getPages, addMetaKeys, renameMetaValue, getPID, lookup.
Also migrates remaining core callers (Ajax, FeedCreator, ApiCore) to use the new classes directly and fixes a UTF-8 case folding bug in MetadataSearch title lookups.
show more ...
|
| 21fbd01b | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: add integrity checking to Collection architecture
Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequenc
SearchIndex: add integrity checking to Collection architecture
Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequency, entity==reverse, entity==token for direct collections). Throws IndexIntegrityException on the first inconsistency found.
Add Countable interface to AbstractIndex with count() implementations in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and Indexer::isIndexEmpty() to orchestrate checks across all collections.
Update infoutils.php to use the new Indexer API instead of the old FulltextIndex/MetadataIndex classes.
Fix range(1, 0) bug in three places that produced [1, 0] instead of an empty array when split-by-length indexes were empty.
show more ...
|
| 6734bb8c | 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: rewrite MetadataSearch to use Collection classes
Replace MetadataIndex usage in MetadataSearch with the new Collection/Index architecture. This completes the read-path migration so data
SearchIndex: rewrite MetadataSearch to use Collection classes
Replace MetadataIndex usage in MetadataSearch with the new Collection/Index architecture. This completes the read-path migration so data written by the Collection-based Indexer is read back correctly using TupleOps tuple format.
Generalize FrequencyCollectionSearch into CollectionSearch that works with any AbstractCollection type (Frequency, Lookup, Direct) and handles both split-by-length and non-split index layouts transparently. DirectCollection participates via resolveTokenFrequencies() which maps token RID = entity RID.
Key changes: - AbstractCollection gains isSplitByLength(), resolveTokenFrequencies(), getEntitiesWithData(), and groupToSuffix() with validation - Index groups are now int (0 = non-split, positive = token length) - CollectionSearch provides both addTerm()/execute() for fulltext and lookup() for metadata-style search (exact/wildcard/callback) - MetadataSearch delegates entirely to collection APIs - Shared filterPages() replaces duplicated page filtering logic - All callers updated from MetadataIndex to MetadataSearch - Tests moved to Search namespace with full coverage for new APIs
show more ...
|