concept.txt - OpenGrok history log for /dokuwiki/inc/Search/concept.txt

Revision	Date	Author	Comments
# db8be586	08-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups - MemoryIndex: auto-save dirty data on unlock/destruction to prevent silent index corruption when indexes ar SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups - MemoryIndex: auto-save dirty data on unlock/destruction to prevent silent index corruption when indexes are used in tandem - TupleOps::parseTuples(): cast exploded count strings to int - FileIndex::retrieveRow(): document the write-on-read padding behavior - Fix whitespace issues in ApiCore, common.php, Sitemap/Mapper - Update concept.txt to reflect MemoryIndex auto-save behavior show more ...
# 2a22d4b9	08-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: document Tokenizer::isValidSearchTerm() in concept.txt
# 1148921d	08-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: unify CollectionSearch API and optimize search pipeline - Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. SearchIndex: unify CollectionSearch API and optimize search pipeline - Remove separate lookup() API from CollectionSearch. All searches now use addTerm()/execute() with a single unified pipeline. - Add matches() predicate to Term using efficient string functions (===, str_starts_with, str_ends_with, str_contains) instead of regex. - Add caseInsensitive() support on CollectionSearch and Term for metadata/title searches where indexed values preserve case. - Remove callback support from MetadataSearch::lookupKey() — the only real usage (case-insensitive substring) is replaced by caseInsensitive() + wildcards. - Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm() for callers that need it (FulltextSearch, Indexer::lookup). - Optimize execute() from 4 group passes to 2: scan tokens + resolve frequencies in one pass per group, batch entity name resolution, then populate Terms. - Store full match detail in Term: entity → token → frequency. New accessors getMatches(), getEntityTokens(), getEntityFrequencies() derive different views from this single data structure. - Term no longer used as scratch pad by CollectionSearch. Index-internal data (token IDs, entity IDs) stays local to execute(). Terms receive only final resolved results. - Use title from search results in MetadataSearch::pageLookupCallBack() instead of re-fetching via p_get_first_heading(). - Update concept.txt documentation. show more ...
# b9d7a615	07-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: updated documentation to be moved into the wiki later
# f2bbffb5	05-Apr-2026	Andreas Gohr <andi@splitbrain.org>	SearchIndex: extract Collection base class hierarchy Introduce AbstractCollection as the shared base for all index collections, with FrequencyCollection and LookupCollection as the two abstract subc SearchIndex: extract Collection base class hierarchy Introduce AbstractCollection as the shared base for all index collections, with FrequencyCollection and LookupCollection as the two abstract subclasses differing only in how tokens are counted (frequency vs dedup). Key design decisions: - splitByLength is a constructor parameter on AbstractCollection controlling whether token/frequency indexes use length-based file splitting. This is independent of the collection type. - The reverse index format is self-describing: entries with * have a group prefix (split), entries without don't (non-split). No branching needed in parse/format methods. - addEntity, resolveTokens, updateIndexes, and reverse index handling all live in AbstractCollection. Subclasses only implement countTokens(). Concrete collections: PageFulltextCollection (frequency, split), MediaCollection and ReferencesCollection (lookup, non-split). Renames FulltextCollection -> PageFulltextCollection and FulltextCollectionSearch -> FrequencyCollectionSearch. show more ...
# 7f394dd6	05-Apr-2026	Andreas Gohr <andi@splitbrain.org>	Merge branch 'master' into searchIndex-finish * master: (55 commits) Translation update (pt-br) Bump phpseclib/phpseclib from 3.0.49 to 3.0.50 �� Update deleted files strict value comparison Merge branch 'master' into searchIndex-finish * master: (55 commits) Translation update (pt-br) Bump phpseclib/phpseclib from 3.0.49 to 3.0.50 �� Update deleted files strict value comparison in auth session check. fixes #4602 Translation update (pt-br) Translation update (pt-br) remove utf8_encode() from authad plugin todo checker action: ignore vendor updated rector and applied it removed another php 7.4 workaround removed an old PHP 5 workaround in HTTPClient remove checks for mbstring.func_overload removed php 8 polyfills ignore HTML validation issue with skipped headline levels declare PrefCookie constant visibility update slika which fixes another php 8.5 deprecation issue fix http tests fix destructuring false returns from changelog functions avoid using null as cache key Fix deprecation warning in UTF8/Conversion ... show more ...
# 8ae94493	30-Oct-2025	Andreas Gohr <gohr@cosmocode.de>	update SearchIndex concept doc
# 596d5287	11-May-2023	Andreas Gohr <andi@splitbrain.org>	Working fulltext collection and search This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced. It should be possible to Working fulltext collection and search This finalizes the FulltextCollection and FulltextCollectionSearch classes. Proper locking is implemented, tests have been enhanced. It should be possible to reimplement the page full text search on top of it. show more ...