History log of /dokuwiki/inc/Search/Tokenizer.php (Results 1 – 8 of 8)
Revision Date Author Comments
# 9369b4a9 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: rector, phpcs, type hint fixes


# 1148921d 08-Apr-2026 Andreas Gohr <andi@splitbrain.org>

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.

SearchIndex: unify CollectionSearch API and optimize search pipeline

- Remove separate lookup() API from CollectionSearch. All searches now
use addTerm()/execute() with a single unified pipeline.
- Add matches() predicate to Term using efficient string functions
(===, str_starts_with, str_ends_with, str_contains) instead of regex.
- Add caseInsensitive() support on CollectionSearch and Term for
metadata/title searches where indexed values preserve case.
- Remove callback support from MetadataSearch::lookupKey() — the only
real usage (case-insensitive substring) is replaced by
caseInsensitive() + wildcards.
- Remove min-length validation from Term. Add Tokenizer::isValidSearchTerm()
for callers that need it (FulltextSearch, Indexer::lookup).
- Optimize execute() from 4 group passes to 2: scan tokens + resolve
frequencies in one pass per group, batch entity name resolution, then
populate Terms.
- Store full match detail in Term: entity → token → frequency. New
accessors getMatches(), getEntityTokens(), getEntityFrequencies()
derive different views from this single data structure.
- Term no longer used as scratch pad by CollectionSearch. Index-internal
data (token IDs, entity IDs) stays local to execute(). Terms receive
only final resolved results.
- Use title from search results in MetadataSearch::pageLookupCallBack()
instead of re-fetching via p_get_first_heading().
- Update concept.txt documentation.

show more ...


# 596d5287 11-May-2023 Andreas Gohr <andi@splitbrain.org>

Working fulltext collection and search

This finalizes the FulltextCollection and FulltextCollectionSearch
classes. Proper locking is implemented, tests have been enhanced.

It should be possible to

Working fulltext collection and search

This finalizes the FulltextCollection and FulltextCollectionSearch
classes. Proper locking is implemented, tests have been enhanced.

It should be possible to reimplement the page full text search on top of
it.

show more ...


# 1755450b 26-Sep-2020 Satoshi Sahara <sahara.satoshi@gmail.com>

change Tokenizer static utility

frequently used in ajax call, singleton is not effective to reduce multiple instantiations.


# 15f699ac 10-Sep-2020 Andreas Gohr <andi@splitbrain.org>

replace user errors with exceptions

Exceptions are better to handle than errors. What I don't like is that
we now have an unfortunate mix of return code and exception signalling
for errors. Some met

replace user errors with exceptions

Exceptions are better to handle than errors. What I don't like is that
we now have an unfortunate mix of return code and exception signalling
for errors. Some methods still return false for errors while others
now throw exceptions (always returning true otherwise).

show more ...


# f2e1d0bf 02-Feb-2020 Satoshi Sahara <sahara.satoshi@gmail.com>

fix set Stopwords property


# 743c9a28 31-Jan-2020 Satoshi Sahara <sahara.satoshi@gmail.com>

rename PagewordIndex to FulltextIndex


# 094ebf29 20-Jan-2020 Satoshi Sahara <sahara.satoshi@gmail.com>

separate Tokenizer class