AbstractIndex.php - OpenGrok history log for /dokuwiki/inc/Search/Index/AbstractIndex.php

Revision	Date	Author	Comments
# bc12b8fe	05-Jul-2026	Andreas Gohr <andi@splitbrain.org>	fix(search): read split index suffixes correctly AbstractIndex::max() used /(\d)+\.idx$/ to determine the highest split index suffix. For filenames like w10.idx or w12.idx that regex captured only t fix(search): read split index suffixes correctly AbstractIndex::max() used /(\d)+\.idx$/ to determine the highest split index suffix. For filenames like w10.idx or w12.idx that regex captured only the last digit, so the maximum token-length group was truncated to 9. Wildcard searches therefore skipped all fulltext index shards for words with 10 or more characters. Tighten the match to the basename and require the current index name followed immediately by digits: ^<idx>(\d+)\.idx$. This captures the full numeric suffix and also avoids counting unrelated index families that share the same prefix, such as treating wiki2.idx as a numbered shard of the w index. Add regressions for both behaviors: multi-digit suffixes are parsed correctly, and same-prefix indexes are ignored when determining the maximum shard number. show more ...
# 9369b4a9	08-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: rector, phpcs, type hint fixes
# 1148921d	08-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: unify CollectionSearch API and optimize search pipeline - Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. SearchIndex: unify CollectionSearch API and optimize search pipeline - Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation. show more ...
# 21fbd01b	07-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: add integrity checking to Collection architecture Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequenc SearchIndex: add integrity checking to Collection architecture Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequency, entity==reverse, entity==token for direct collections). Throws IndexIntegrityException on the first inconsistency found. Add Countable interface to AbstractIndex with count() implementations in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and Indexer::isIndexEmpty() to orchestrate checks across all collections. Update infoutils.php to use the new Indexer API instead of the old FulltextIndex/MetadataIndex classes. Fix range(1, 0) bug in three places that produced [1, 0] instead of an empty array when split-by-length indexes were empty. show more ...
# 83b3accc	06-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: rewrite Indexer to use Collection classes Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex) with the new Collection-based architecture. The Indexer is now a thin sta SearchIndex: rewrite Indexer to use Collection classes Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex) with the new Collection-based architecture. The Indexer is now a thin stateless orchestrator that delegates all index work to collections. Key changes: - Indexer no longer extends AbstractIndex; page name passed to methods - addPage/deletePage/clear use PageTitleCollection, PageFulltextCollection, and PageMetaCollection - New PageMetaCollection replaces separate ReferencesCollection and MediaCollection with a single class that handles arbitrary metadata keys dynamically - Shared writable FileIndex('page') passed to all collections - Logger callback replaces verbose parameter - Methods return void instead of bool - Index classes implement IteratorAggregate for clean data access - Indexer tests consolidated into namespaced IndexerTest.php - All callers updated to new stateless API show more ...
# c66b5ec6	05-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: rewrite Lock as static registry with reference counting Replace the instance-based Lock class with a static registry that tracks held locks per-process with reference counting. This sol SearchIndex: rewrite Lock as static registry with reference counting Replace the instance-based Lock class with a static registry that tracks held locks per-process with reference counting. This solves three problems: - Split indexes (w3, w4, ...) share a single lock name and now coordinate naturally via the registry - Multiple callers can acquire the same lock without conflict - Indexes enforce their own writability through lock()/unlock() methods on AbstractIndex The Lock registry manages both the filesystem lock (mkdir) and the in-process tracking. The first acquire creates the directory, subsequent acquires increment the refcount. Release decrements, and only removes the directory when the count reaches zero. Note: I am not sure if implementing this as a static object is a great idea or if we should pass an instance through the collection to the indexes... show more ...
# 596d5287	11-May-2023	Andreas Gohr <andi@splitbrain.org>	Working fulltext collection and search This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced. It should be possible to Working fulltext collection and search This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced. It should be possible to reimplement the page full text search on top of it. show more ...
# 7fcedc39	09-May-2023	Andreas Gohr <andi@splitbrain.org>	Indexes can now be opened in readonly mode
# 9f63f003	08-May-2023	Andreas Gohr <andi@splitbrain.org>	add method to retrieve multiple rows at once
# 03a35633	12-Sep-2022	Andreas Gohr <andi@splitbrain.org>	added method to search an index by regular expression
# 8ed35011	08-Dec-2021	Andreas Gohr <andi@splitbrain.org>	better method names
# d6396b6d	08-Dec-2021	Andreas Gohr <andi@splitbrain.org>	we need the same access methods in both index types
# ec5280ef	04-Dec-2021	Andreas Gohr <andi@splitbrain.org>	rearranging the Index class structure This is a first step at stuff at restructuring the indexing classes a bit more. Some background: We have basically two different kind of index files: a) RowI rearranging the Index class structure This is a first step at stuff at restructuring the indexing classes a bit more. Some background: We have basically two different kind of index files: a) RowIndex (like page.idx) Each line in the index contains a single value. The line number is used as primary ID. These files can be very large. Thus an index like that should never be read into memory completely if it can be avoided. b) TupleIndex (like i12.idx) Each line contains a list of tuples. The files tend to be smaller so loading them completely for search and replace is easier. Since the the access is so completely different, I tried to model that in the two different classes, basically moving the methods from \dokuwiki\Search\AbstractIndex to the new classes. While doing so, I tried to make the doc blocks, variable names and interface easier to understand. I also added tests for each of the methods. The old code has not been touched yet. So these classes do not do anything outside of tests currently. show more ...