| 1c07b9e6 | 16-Nov-2010 |
Tom N Harris <tnharris@whoopdedo.org> |
Use external program to split pages into words
An external tokenizer inserts extra spaces to mark words in the input text. The text is sent through STDIN and STDOUT file handles.
A good choice for
Use external program to split pages into words
An external tokenizer inserts extra spaces to mark words in the input text. The text is sent through STDIN and STDOUT file handles.
A good choice for Chinese and Japanese is MeCab. http://sourceforge.net/projects/mecab/ With the command line 'mecab -O wakati'
show more ...
|
| 6c528220 | 16-Nov-2010 |
Tom N Harris <tnharris@whoopdedo.org> |
Repurpose io_runcmd for pipes |
| 4753bcc0 | 15-Nov-2010 |
Michael Hamann <michael@content-space.de> |
Indexer improvement: regex instead of arrays for lines
When updating a single line that line was split into an array and in a loop over that array one entry was removed and afterwards a new one adde
Indexer improvement: regex instead of arrays for lines
When updating a single line that line was split into an array and in a loop over that array one entry was removed and afterwards a new one added. Tests have shown that using a regex for doing that is much faster which can be easily explained as that regex is very simple to match while a loop over an array isn't that fast. As that update function is called for every word in a page the impact of this change is significant.
show more ...
|
| e5e50383 | 15-Nov-2010 |
Michael Hamann <michael@content-space.de> |
Indexer improvement: Only write the words index when needed
This adds a simple boolean variable that tracks if new words have been added. When editing a page in many cases all words have already bee
Indexer improvement: Only write the words index when needed
This adds a simple boolean variable that tracks if new words have been added. When editing a page in many cases all words have already been used somewhere else or just one or two words are new. Until this change all words indexes read were always written, now only the changed ones are written. The overhead of the new boolean variable should be low.
show more ...
|
| 037b5573 | 15-Nov-2010 |
Michael Hamann <michael@content-space.de> |
Indexer improvement: replace _freadline by fgets
In PHP versions newer than 4.3.0 fgets reads a whole line regardless of its length when no length is given. Thus the loop in _freadline isn't needed.
Indexer improvement: replace _freadline by fgets
In PHP versions newer than 4.3.0 fgets reads a whole line regardless of its length when no length is given. Thus the loop in _freadline isn't needed. This increases the speed significantly as _freadline was called very often.
show more ...
|
| 06af2d03 | 15-Nov-2010 |
Michael Hamann <michael@content-space.de> |
Indexer speed improvement: joined array vs. single lines
From my experience with a benchmark of the indexer it is faster to first join the array of all index entries and then write them back togethe
Indexer speed improvement: joined array vs. single lines
From my experience with a benchmark of the indexer it is faster to first join the array of all index entries and then write them back together instead of writing every single entry. This might increase memory usage, but I couldn't see a significant increase and this function is also only used for the small index files, not for the large pagewords index.
show more ...
|
| 5bcab0c4 | 15-Nov-2010 |
Tom N Harris <tnharris@whoopdedo.org> |
tokenizer was returning prematurely |
| 430d05b0 | 14-Nov-2010 |
Michael Hamann <michael@content-space.de> |
Use native PHP JSON functions when available |
| 4e1bf408 | 14-Nov-2010 |
Tom N Harris <tnharris@whoopdedo.org> |
Refactor tokenizer to avoid splitting multiple times |
| 4b9792c6 | 14-Nov-2010 |
Tom N Harris <tnharris@whoopdedo.org> |
Measure length of multi-character Asian words |
| 3a1a171b | 14-Nov-2010 |
Tom N Harris <tnharris@whoopdedo.org> |
Remove unused idx_touchIndex function |
| a365baee | 13-Nov-2010 |
Dominik Eckelmann <deckelmann@gmail.com> |
improved some metadata comments |
| e8bc5751 | 13-Nov-2010 |
Anika Henke <anika@selfthinker.org> |
FS#2079: always show profile and subscribe links/buttons |
| 1172f8dc | 13-Nov-2010 |
Adrian Lang <dokuwiki@adrianlang.de> |
Introduce metadata write wrapper p_save_metadata
p_purge_metadata now updates the metadata cache and the INFO array like the other metadata writing functions |
| 709b1063 | 13-Nov-2010 |
Adrian Lang <dokuwiki@adrianlang.de> |
Simpler ID trimming |
| 3903be5d | 13-Nov-2010 |
Adrian Lang <dokuwiki@adrianlang.de> |
Remove metadata conversion from 0a7e3bce (2006-11-26) |
| afca7e7e | 12-Nov-2010 |
Anika Henke <anika@selfthinker.org> |
FS#1839: take favicon from mediadir (if it exists) |
| f5baf821 | 07-Nov-2010 |
Anika Henke <anika@selfthinker.org> |
make custom buttons possible with html_btn() without the need of global $lang (more consistent with tpl_pagelink()) |
| dab7d18d | 07-Nov-2010 |
Choicky Chou <zhoucaiqi@gmail.com> |
Chinese Language update |
| 5ec3fefc | 05-Nov-2010 |
Andreas Gohr <andi@splitbrain.org> |
handle mailfrom replacements in a central place FS#2091 |
| 6d3e6259 | 02-Nov-2010 |
Michael Hamann <michael@content-space.de> |
Only add successfully created sitemap items to the sitemap |
| d9e0d8dc | 31-Oct-2010 |
Vadim Nevorotin <malamut@ubuntu.ru> |
Fix XSS vulnerability FS#2085 |
| 0dbd6a11 | 30-Oct-2010 |
Inko I.A <inkoia@gmail.com> |
Basque language update |
| 5bbab499 | 27-Oct-2010 |
Matthias Schulte <post@lupo49.de> |
de-informal / typo fix de / typo fix |
| d0f714c0 | 27-Oct-2010 |
Matthias Schulte <post@lupo49.de> |
de / typo fixed |