| #
db8be586 |
| 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups
- MemoryIndex: auto-save dirty data on unlock/destruction to prevent silent index corruption when indexes ar
SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups
- MemoryIndex: auto-save dirty data on unlock/destruction to prevent silent index corruption when indexes are used in tandem - TupleOps::parseTuples(): cast exploded count strings to int - FileIndex::retrieveRow(): document the write-on-read padding behavior - Fix whitespace issues in ApiCore, common.php, Sitemap/Mapper - Update concept.txt to reflect MemoryIndex auto-save behavior
show more ...
|
| #
2a22d4b9 |
| 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: document Tokenizer::isValidSearchTerm() in concept.txt
|
| #
1148921d |
| 08-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline.
SearchIndex: unify CollectionSearch API and optimize search pipeline
- Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation.
show more ...
|
| #
b9d7a615 |
| 07-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: updated documentation
to be moved into the wiki later
|
| #
f2bbffb5 |
| 05-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
SearchIndex: extract Collection base class hierarchy
Introduce AbstractCollection as the shared base for all index collections, with FrequencyCollection and LookupCollection as the two abstract subc
SearchIndex: extract Collection base class hierarchy
Introduce AbstractCollection as the shared base for all index collections, with FrequencyCollection and LookupCollection as the two abstract subclasses differing only in how tokens are counted (frequency vs dedup).
Key design decisions: - splitByLength is a constructor parameter on AbstractCollection controlling whether token/frequency indexes use length-based file splitting. This is independent of the collection type. - The reverse index format is self-describing: entries with * have a group prefix (split), entries without don't (non-split). No branching needed in parse/format methods. - addEntity, resolveTokens, updateIndexes, and reverse index handling all live in AbstractCollection. Subclasses only implement countTokens().
Concrete collections: PageFulltextCollection (frequency, split), MediaCollection and ReferencesCollection (lookup, non-split).
Renames FulltextCollection -> PageFulltextCollection and FulltextCollectionSearch -> FrequencyCollectionSearch.
show more ...
|
| #
7f394dd6 |
| 05-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
Merge branch 'master' into searchIndex-finish
* master: (55 commits) Translation update (pt-br) Bump phpseclib/phpseclib from 3.0.49 to 3.0.50 Update deleted files strict value comparison
Merge branch 'master' into searchIndex-finish
* master: (55 commits) Translation update (pt-br) Bump phpseclib/phpseclib from 3.0.49 to 3.0.50 Update deleted files strict value comparison in auth session check. fixes #4602 Translation update (pt-br) Translation update (pt-br) remove utf8_encode() from authad plugin todo checker action: ignore vendor updated rector and applied it removed another php 7.4 workaround removed an old PHP 5 workaround in HTTPClient remove checks for mbstring.func_overload removed php 8 polyfills ignore HTML validation issue with skipped headline levels declare PrefCookie constant visibility update slika which fixes another php 8.5 deprecation issue fix http tests fix destructuring false returns from changelog functions avoid using null as cache key Fix deprecation warning in UTF8/Conversion ...
show more ...
|
| #
8ae94493 |
| 30-Oct-2025 |
Andreas Gohr <gohr@cosmocode.de> |
update SearchIndex concept doc
|
| #
596d5287 |
| 11-May-2023 |
Andreas Gohr <andi@splitbrain.org> |
Working fulltext collection and search
This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced.
It should be possible to
Working fulltext collection and search
This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced.
It should be possible to reimplement the page full text search on top of it.
show more ...
|