| 06053dca | 10-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: remove write side effect from retrieveRow()
retrieveRow() padded the index file when the requested RID was beyond the current length. This was an optimization for subsequent changeRow()
SearchIndex: remove write side effect from retrieveRow()
retrieveRow() padded the index file when the requested RID was beyond the current length. This was an optimization for subsequent changeRow() calls, but changeRow() already handles padding on its own. The side effect was also inconsistent with retrieveRows() which is a pure read.
Added a cross-index integration test verifying RID consistency across entity, token, frequency and reverse indexes when multiple entities share tokens.
show more ...
|
| db8be586 | 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups
- MemoryIndex: auto-save dirty data on unlock/destruction to prevent silent index corruption when indexes ar
SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups
- MemoryIndex: auto-save dirty data on unlock/destruction to prevent silent index corruption when indexes are used in tandem - TupleOps::parseTuples(): cast exploded count strings to int - FileIndex::retrieveRow(): document the write-on-read padding behavior - Fix whitespace issues in ApiCore, common.php, Sitemap/Mapper - Update concept.txt to reflect MemoryIndex auto-save behavior
show more ...
|
| 1148921d | 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline.
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation.
show more ...
|
| 83b3accc | 06-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: rewrite Indexer to use Collection classes
Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex) with the new Collection-based architecture. The Indexer is now a thin sta
SearchIndex: rewrite Indexer to use Collection classes
Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex) with the new Collection-based architecture. The Indexer is now a thin stateless orchestrator that delegates all index work to collections.
Key changes: - Indexer no longer extends AbstractIndex; page name passed to methods - addPage/deletePage/clear use PageTitleCollection, PageFulltextCollection, and PageMetaCollection - New PageMetaCollection replaces separate ReferencesCollection and MediaCollection with a single class that handles arbitrary metadata keys dynamically - Shared writable FileIndex('page') passed to all collections - Logger callback replaces verbose parameter - Methods return void instead of bool - Index classes implement IteratorAggregate for clean data access - Indexer tests consolidated into namespaced IndexerTest.php - All callers updated to new stateless API
show more ...
|
| b8ef19fe | 04-Dec-2021 |
Andreas Gohr <andi@splitbrain.org> |
Add support to access multiple rows at once
When saving word indexs (w*.idx) often multiple words of the same length will need to be accessed. This implements a new method that allows that in an eff
Add support to access multiple rows at once
When saving word indexs (w*.idx) often multiple words of the same length will need to be accessed. This implements a new method that allows that in an efficient way.
Note: this removes the INDEX_MARK_DELETED mechanism to mark deleted entries. Entries are now deleted using empty lines again. This makes the batch handling much simpler. If a good reason exists that we should keep it, it can be readded.
show more ...
|