History log of /dokuwiki/inc/Search/concept.txt (Results 1 – 8 of 8)
Revision Date Author Comments
# db8be586 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups

- MemoryIndex: auto-save dirty data on unlock/destruction to prevent
silent index corruption when indexes ar

SearchIndex: review fixes — auto-save MemoryIndex, cast TupleOps counts, style cleanups

- MemoryIndex: auto-save dirty data on unlock/destruction to prevent
silent index corruption when indexes are used in tandem
- TupleOps::parseTuples(): cast exploded count strings to int
- FileIndex::retrieveRow(): document the write-on-read padding behavior
- Fix whitespace issues in ApiCore, common.php, Sitemap/Mapper
- Update concept.txt to reflect MemoryIndex auto-save behavior

show more ...


# 2a22d4b9 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: document Tokenizer::isValidSearchTerm() in concept.txt


# 1148921d 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.
- Add matches() predicate to Term using efficient string functions
(===, str_starts_with, str_ends_with, str_contains) instead of regex.
- Add caseInsensitive() support on CollectionSearch and Term for
metadata/title searches where indexed values preserve case.
- Remove callback support from MetadataSearch::lookupKey() — the only
real usage (case-insensitive substring) is replaced by
caseInsensitive() + wildcards.
- Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm()
for callers that need it (FulltextSearch, Indexer::lookup).
- Optimize execute() from 4 group passes to 2: scan tokens + resolve
frequencies in one pass per group, batch entity name resolution, then
populate Terms.
- Store full match detail in Term: entity → token → frequency. New
accessors getMatches(), getEntityTokens(), getEntityFrequencies()
derive different views from this single data structure.
- Term no longer used as scratch pad by CollectionSearch. Index-internal
data (token IDs, entity IDs) stays local to execute(). Terms receive
only final resolved results.
- Use title from search results in MetadataSearch::pageLookupCallBack()
instead of re-fetching via p_get_first_heading().
- Update concept.txt documentation.

show more ...


# b9d7a615 07-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: updated documentation

to be moved into the wiki later


# f2bbffb5 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: extract Collection base class hierarchy

Introduce AbstractCollection as the shared base for all index
collections, with FrequencyCollection and LookupCollection as
the two abstract subc

SearchIndex: extract Collection base class hierarchy

Introduce AbstractCollection as the shared base for all index
collections, with FrequencyCollection and LookupCollection as
the two abstract subclasses differing only in how tokens are
counted (frequency vs dedup).

Key design decisions:
- splitByLength is a constructor parameter on AbstractCollection
controlling whether token/frequency indexes use length-based
file splitting. This is independent of the collection type.
- The reverse index format is self-describing: entries with *
have a group prefix (split), entries without don't (non-split).
No branching needed in parse/format methods.
- addEntity, resolveTokens, updateIndexes, and reverse index
handling all live in AbstractCollection. Subclasses only
implement countTokens().

Concrete collections: PageFulltextCollection (frequency, split),
MediaCollection and ReferencesCollection (lookup, non-split).

Renames FulltextCollection -> PageFulltextCollection and
FulltextCollectionSearch -> FrequencyCollectionSearch.

show more ...


# 7f394dd6 05-Apr-2026 Andreas Gohr <andi@splitbrain.org>

Merge branch 'master' into searchIndex-finish

* master: (55 commits)
Translation update (pt-br)
Bump phpseclib/phpseclib from 3.0.49 to 3.0.50
�� Update deleted files
strict value comparison

Merge branch 'master' into searchIndex-finish

* master: (55 commits)
Translation update (pt-br)
Bump phpseclib/phpseclib from 3.0.49 to 3.0.50
�� Update deleted files
strict value comparison in auth session check. fixes #4602
Translation update (pt-br)
Translation update (pt-br)
remove utf8_encode() from authad plugin
todo checker action: ignore vendor
updated rector and applied it
removed another php 7.4 workaround
removed an old PHP 5 workaround in HTTPClient
remove checks for mbstring.func_overload
removed php 8 polyfills
ignore HTML validation issue with skipped headline levels
declare PrefCookie constant visibility
update slika which fixes another php 8.5 deprecation issue
fix http tests
fix destructuring false returns from changelog functions
avoid using null as cache key
Fix deprecation warning in UTF8/Conversion
...

show more ...


# 8ae94493 30-Oct-2025 Andreas Gohr <gohr@cosmocode.de>

update SearchIndex concept doc


# 596d5287 11-May-2023 Andreas Gohr <andi@splitbrain.org>

Working fulltext collection and search

This finalizes the FulltextCollection and FulltextCollectionSearch
classes. Proper locking is implemented, tests have been enhanced.

It should be possible to

Working fulltext collection and search

This finalizes the FulltextCollection and FulltextCollectionSearch
classes. Proper locking is implemented, tests have been enhanced.

It should be possible to reimplement the page full text search on top of
it.

show more ...