Home
last modified time | relevance | path

Searched hist:"1 c07b9e622d139fa815c955c89569f96342475fb" (Results 1 – 1 of 1) sorted by relevance

/dokuwiki/inc/
H A Dindexer.php1c07b9e622d139fa815c955c89569f96342475fb Tue Nov 16 23:09:53 UTC 2010 Tom N Harris <tnharris@whoopdedo.org> Use external program to split pages into words

An external tokenizer inserts extra spaces to mark words in the input text.
The text is sent through STDIN and STDOUT file handles.

A good choice for Chinese and Japanese is MeCab.
http://sourceforge.net/projects/mecab/
With the command line 'mecab -O wakati'