| #
9369b4a9 |
| 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: rector, phpcs, type hint fixes
|
| #
1148921d |
| 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline.
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation.
show more ...
|
| #
21fbd01b |
| 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: add integrity checking to Collection architecture
Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequenc
SearchIndex: add integrity checking to Collection architecture
Add checkIntegrity() to AbstractCollection and DirectCollection that verifies paired indexes have matching line counts (token==frequency, entity==reverse, entity==token for direct collections). Throws IndexIntegrityException on the first inconsistency found.
Add Countable interface to AbstractIndex with count() implementations in MemoryIndex and FileIndex. Add Indexer::checkIntegrity() and Indexer::isIndexEmpty() to orchestrate checks across all collections.
Update infoutils.php to use the new Indexer API instead of the old FulltextIndex/MetadataIndex classes.
Fix range(1, 0) bug in three places that produced [1, 0] instead of an empty array when split-by-length indexes were empty.
show more ...
|
| #
83b3accc |
| 06-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: rewrite Indexer to use Collection classes
Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex) with the new Collection-based architecture. The Indexer is now a thin sta
SearchIndex: rewrite Indexer to use Collection classes
Replace the intermediate #2943 classes (FulltextIndex, MetadataIndex) with the new Collection-based architecture. The Indexer is now a thin stateless orchestrator that delegates all index work to collections.
Key changes: - Indexer no longer extends AbstractIndex; page name passed to methods - addPage/deletePage/clear use PageTitleCollection, PageFulltextCollection, and PageMetaCollection - New PageMetaCollection replaces separate ReferencesCollection and MediaCollection with a single class that handles arbitrary metadata keys dynamically - Shared writable FileIndex('page') passed to all collections - Logger callback replaces verbose parameter - Methods return void instead of bool - Index classes implement IteratorAggregate for clean data access - Indexer tests consolidated into namespaced IndexerTest.php - All callers updated to new stateless API
show more ...
|
| #
c66b5ec6 |
| 05-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: rewrite Lock as static registry with reference counting
Replace the instance-based Lock class with a static registry that tracks held locks per-process with reference counting. This sol
SearchIndex: rewrite Lock as static registry with reference counting
Replace the instance-based Lock class with a static registry that tracks held locks per-process with reference counting. This solves three problems:
- Split indexes (w3, w4, ...) share a single lock name and now coordinate naturally via the registry - Multiple callers can acquire the same lock without conflict - Indexes enforce their own writability through lock()/unlock() methods on AbstractIndex
The Lock registry manages both the filesystem lock (mkdir) and the in-process tracking. The first acquire creates the directory, subsequent acquires increment the refcount. Release decrements, and only removes the directory when the count reaches zero.
Note: I am not sure if implementing this as a static object is a great idea or if we should pass an instance through the collection to the indexes...
show more ...
|
| #
596d5287 |
| 11-May-2023 |
Andreas Gohr <andi@splitbrain.org> |
Working fulltext collection and search
This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced.
It should be possible to
Working fulltext collection and search
This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced.
It should be possible to reimplement the page full text search on top of it.
show more ...
|
| #
7fcedc39 |
| 09-May-2023 |
Andreas Gohr <andi@splitbrain.org> |
Indexes can now be opened in readonly mode
|
| #
9f63f003 |
| 08-May-2023 |
Andreas Gohr <andi@splitbrain.org> |
add method to retrieve multiple rows at once
|
| #
03a35633 |
| 12-Sep-2022 |
Andreas Gohr <andi@splitbrain.org> |
added method to search an index by regular expression
|
| #
8ed35011 |
| 08-Dec-2021 |
Andreas Gohr <andi@splitbrain.org> |
better method names
|
| #
d6396b6d |
| 08-Dec-2021 |
Andreas Gohr <andi@splitbrain.org> |
we need the same access methods in both index types
|
| #
ec5280ef |
| 04-Dec-2021 |
Andreas Gohr <andi@splitbrain.org> |
rearranging the Index class structure
This is a first step at stuff at restructuring the indexing classes a bit more.
Some background:
We have basically two different kind of index files:
a) RowI
rearranging the Index class structure
This is a first step at stuff at restructuring the indexing classes a bit more.
Some background:
We have basically two different kind of index files:
a) RowIndex (like page.idx)
Each line in the index contains a single value. The line number is used as primary ID. These files can be very large. Thus an index like that should never be read into memory completely if it can be avoided.
b) TupleIndex (like i12.idx)
Each line contains a list of tuples. The files tend to be smaller so loading them completely for search and replace is easier.
Since the the access is so completely different, I tried to model that in the two different classes, basically moving the methods from \dokuwiki\Search\AbstractIndex to the new classes.
While doing so, I tried to make the doc blocks, variable names and interface easier to understand. I also added tests for each of the methods.
The old code has not been touched yet. So these classes do not do anything outside of tests currently.
show more ...
|