indexer.php - OpenGrok history log for /dokuwiki/inc/indexer.php

Revision	Date	Author	Comments
# d0d6fe1b	22-Feb-2011	Tom N Harris <tnharris@whoopdedo.org>	Indexer version tag should include plugin names
# 0604da34	22-Feb-2011	Tom N Harris <tnharris@whoopdedo.org>	Removing a page from the index deletes related metadata. Cache key names in index.
# b00bd361	22-Feb-2011	Tom N Harris <tnharris@whoopdedo.org>	Indexer::lookupKey callback receives value reference as first arg
# c1209673	18-Feb-2011	Tom N Harris <tnharris@whoopdedo.org>	Special handling of title metadata index
# 4f0030dd	06-Feb-2011	Andreas Gohr <andi@splitbrain.org>	ignore soft-hyphens for search FS#2049 This makes it possible to find words that include soft-hyphens. However, search higlighting will not work and I have no idea how to make it work.
# 89b8c518	02-Feb-2011	Michael Hamann <michael@content-space.de>	Merge remote-tracking branch 'my-fork/master' into indexer_improvements
# f078bb00	24-Jan-2011	Tom N Harris <tnharris@whoopdedo.org>	Indexer Rewrite v3: wildcards in lookupKey and automatically unwrap single result
# bbc85ee4	24-Jan-2011	Tom N Harris <tnharris@whoopdedo.org>	Indexer v3 Rewrite: streamline indexing of deleted or disabled pages
# 8605afb1	23-Jan-2011	Michael Hamann <michael@content-space.de>	Add INDEXER_VERSION_GET event so plugins can add their version This allows plugins to add their own version strings like plugin_tag=1 so pages can be reindexed when plugins update their index conten Add INDEXER_VERSION_GET event so plugins can add their version This allows plugins to add their own version strings like plugin_tag=1 so pages can be reindexed when plugins update their index content. show more ...
# ae6c4ec0	23-Jan-2011	Danny Lin <danny0838@pchome.com.tw>	Add CJK characters to IDX_ASIAN2 - FS#2143
# 320f489a	23-Jan-2011	Michael Hamann <michael@content-space.de>	Indexer v3 Rewrite: Use the metadata index for backlinks; add INDEXER_METADATA_INDEX event This new event allows plugins to add or modify the metadata that will be indexed. Collecting this metadata Indexer v3 Rewrite: Use the metadata index for backlinks; add INDEXER_METADATA_INDEX event This new event allows plugins to add or modify the metadata that will be indexed. Collecting this metadata in an event allows plugins to see if other plugins have already added the metadata they need and leads to just one single indexer call thus fewer files are read and written. Plugins could also replace/prevent the metadata indexer call using this event. show more ...
# e1e1a7e0	23-Jan-2011	Michael Hamann <michael@content-space.de>	Indexer v3 Rewrite: fix addMetaKeys and locking This fixes addMetaKeys so it actually removes values. This also changes the functionality of the function: It now updates the key for the page with th Indexer v3 Rewrite: fix addMetaKeys and locking This fixes addMetaKeys so it actually removes values. This also changes the functionality of the function: It now updates the key for the page with the current value instead of adding new values as this will be the default use case. A new parameter could be added to restore the "old" behavior when needed. addMetaKeys now only saves the index when the content has really been changed. Furthermore no empty number is added anymore to the reverse index when it has been empty previously. addMetaKeys now releases the lock again and really fails when the lock can't be gained. show more ...
# cd763a5b	22-Jan-2011	Michael Hamann <michael@content-space.de>	Indexer v3 Rewrite: implement lookupKey() Saving and looking up metadata key/value pairs seems to work now at least with some basic tests.
# 4373c7b5	22-Jan-2011	Michael Hamann <michael@content-space.de>	Indexer v3 Rewrite: _saveIndexKey now really writes on the desired line Now _saveIndexKey inserts empty lines when the index isn't long enough. This is necessary because the page ids are taken from Indexer v3 Rewrite: _saveIndexKey now really writes on the desired line Now _saveIndexKey inserts empty lines when the index isn't long enough. This is necessary because the page ids are taken from the global page index, but there is not every page in the metadata key specific index so e.g. line 10 might be the first entry in the index. show more ...
# d64516f5	22-Jan-2011	Michael Hamann <michael@content-space.de>	Indexer v3 Rewrite: fix obvious typos and type errors
# 4a819402	10-Jan-2011	Michael Hamann <michael@content-space.de>	Activate the render parameter of p_get_metadata p_get_metadata has a $render parameter that has been disabled by the restructuring of metadata rendering. This change reactivates it so rendering meta Activate the render parameter of p_get_metadata p_get_metadata has a $render parameter that has been disabled by the restructuring of metadata rendering. This change reactivates it so rendering metadata can be prevented. This is e.g. used in the search and in some plugins like indexmenu that use p_get_first_heading. The default of the parameter has been changed to true as otherwise the new caching structure won't work as almost all calls to p_get_metadata don't set the $render parameter. The indexer call to p_get_first_heading has been changed to set $render to true as in the indexer only one page will be rendered and the title in the index should really be the current one. This does not fix the problem that rendering pages with lots of links or displaying the index can cause the parsing/rendering of a lot of pages. show more ...
# 9b41be24	29-Dec-2010	Tom N Harris <tnharris@whoopdedo.org>	Indexer v3 Rewrite part two, update uses of indexer
# 00803e56	28-Dec-2010	Tom N Harris <tnharris@whoopdedo.org>	Indexer v3 Rewrite part one (unstable) The indexer functions have been converted to a class interface. Use the Doku_Indexer class to access the indexer with these public methods: addPageWords ad Indexer v3 Rewrite part one (unstable) The indexer functions have been converted to a class interface. Use the Doku_Indexer class to access the indexer with these public methods: addPageWords addMetaKeys deletePage tokenizer lookup lookupKey getPages histogram These functions are provided for general use: idx_get_version idx_get_indexer idx_get_stopwords idx_addPage idx_lookup idx_tokenizer These functions are still available, but are deprecated: idx_getIndex idx_indexLengths All other old idx_ functions are unsupported and have been removed. show more ...
# e3776c06	29-Nov-2010	Michael Hamann <michael@content-space.de>	Remove enc=utf-8 in VIM modeline as it is not allowed in VIM 7.3 As of VIM 7.3 it is no longer possible to specify the encoding in the modeline. This gives an error message whenever such a file is o Remove enc=utf-8 in VIM modeline as it is not allowed in VIM 7.3 As of VIM 7.3 it is no longer possible to specify the encoding in the modeline. This gives an error message whenever such a file is opened, thus this commit removes the enc setting from the modeline. show more ...
# 3c4b3890	20-Nov-2010	Tom N Harris <tnharris@whoopdedo.org>	Merge branch 'tokenizer-rewrite' into michitux
# 420edfd6	18-Nov-2010	Tom N Harris <tnharris@whoopdedo.org>	Restore io_runcmd, use io_exec for exec with pipes
# 7c2ef4e8	17-Nov-2010	Tom N Harris <tnharris@whoopdedo.org>	Use a different indexer version when external tokenizer is enabled
# 1c07b9e6	16-Nov-2010	Tom N Harris <tnharris@whoopdedo.org>	Use external program to split pages into words An external tokenizer inserts extra spaces to mark words in the input text. The text is sent through STDIN and STDOUT file handles. A good choice for Use external program to split pages into words An external tokenizer inserts extra spaces to mark words in the input text. The text is sent through STDIN and STDOUT file handles. A good choice for Chinese and Japanese is MeCab. http://sourceforge.net/projects/mecab/ With the command line 'mecab -O wakati' show more ...
# 4753bcc0	15-Nov-2010	Michael Hamann <michael@content-space.de>	Indexer improvement: regex instead of arrays for lines When updating a single line that line was split into an array and in a loop over that array one entry was removed and afterwards a new one adde Indexer improvement: regex instead of arrays for lines When updating a single line that line was split into an array and in a loop over that array one entry was removed and afterwards a new one added. Tests have shown that using a regex for doing that is much faster which can be easily explained as that regex is very simple to match while a loop over an array isn't that fast. As that update function is called for every word in a page the impact of this change is significant. show more ...
# e5e50383	15-Nov-2010	Michael Hamann <michael@content-space.de>	Indexer improvement: Only write the words index when needed This adds a simple boolean variable that tracks if new words have been added. When editing a page in many cases all words have already bee Indexer improvement: Only write the words index when needed This adds a simple boolean variable that tracks if new words have been added. When editing a page in many cases all words have already been used somewhere else or just one or two words are new. Until this change all words indexes read were always written, now only the changed ones are written. The overhead of the new boolean variable should be low. show more ...
1 2 345 6 7