Lines Matching full:token

21  * Every collection works with four index types: entity, token, frequency, and reverse.
24 * token - the list of tokens (eg. words) assigned to entities (can be split into multiple files)
25 * frequency - how often a token appears on a entity (can be split into multiple files)
43 * Entity and token indexes can be passed as already instantiated AbstractIndex objects
50 * @param bool $splitByLength Whether to split token/frequency indexes by token length
61 … throw new IndexUsageException('Cannot split by length when using a pre-instantiated token index');
140 * @param int $group Index group (0 for non-split, token length for split)
153 * @param int $group Index group (0 for non-split, token length for split)
172 * Whether this collection splits token/frequency indexes by token length
203 * Resolve token IDs to entity frequencies
205 * Given a set of token IDs from a specific index group, returns the entities
210 * @param int $group Index group (0 for non-split, token length for split)
211 * @param int[] $tokenIds The token IDs to resolve
250 * Maximum suffix for the token indexes (eg. max word length currently stored)
267 * - token == frequency (per group, both keyed by token RID)
274 // Check token/frequency pairs
287 ($tokenIndex->exists() ? 'frequency' : 'token') . ' index'
295 "Group $group: token count ($tc) != frequency count ($fc)"
320 …* The update merges old and new token data. getReverseAssignments() returns all previously stored
321 …* with a value of 0 (see parseReverseRecord). resolveTokens() returns the new token IDs with their…
355 * Calls countTokens() to get token frequencies (subclass responsibility), then groups
356 * by token length if splitByLength is enabled, or under '' if not. Finally resolves
357 * token strings to IDs via the appropriate token index.
359 * @param string[] $tokens The raw token list
369 foreach ($counted as $token => $freq) {
370 $group = $this->splitByLength ? Tokenizer::tokenLength($token) : 0;
371 $groups[$group][$token] = $freq;
374 // resolve token strings to IDs
379 foreach ($tokenFreqs as $token => $freq) {
380 $tokenId = $tokenIndex->getRowID((string)$token);
393 * LookupCollections deduplicate and return 1 for each token.
395 * @param string[] $tokens The raw token list
396 * @return array [token => frequency, ...]
401 * Get the token assignments for a given entity from the reverse index
454 …* The reverse index only stores which token IDs belong to an entity, not their frequencies. All va…
458 … split collections the format is "group*tokenId:group*tokenId:..." where group is the token length.
500 * from that token's frequency record.