| #
72de9068 |
| 19-Jul-2007 |
Andreas Gohr <andi@splitbrain.org> |
several speed improvements in UTF-8 lib
darcs-hash:20070719110142-7ad00-1192e190c62637ed68e2c2c0a0b3135abfd6ecb5.gz
|
| #
37242afa |
| 23-Mar-2007 |
Tom N Harris <tnharris@whoopdedo.org> |
Escape Ctrl-Z so darcs stops treating utf8.php as binary.
darcs-hash:20070323030243-6942e-836105b95078b213df8497386ae9b0418fcf29be.gz
|
| #
9f9fb0e5 |
| 02-Feb-2007 |
Tom N Harris <tnharris@whoopdedo.org> |
Encode/Decode numeric HTML entities correctly.
utf8_tohtml handles all codepoints, and the inverse function, utf8_unhtml, is added.
darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e1
Encode/Decode numeric HTML entities correctly.
utf8_tohtml handles all codepoints, and the inverse function, utf8_unhtml, is added.
darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e160d60e6.gz
show more ...
|
| #
44881bd0 |
| 03-Jan-2007 |
henning.noren <henning.noren@gmail.com> |
tf_rename_lower.patch
Name the TRUE/FALSE-constants consistently as lowercase everywhere. This might also be an tiny optimization in some environments.
darcs-hash:20070103205700-d2a3e-e7ec0aedb938d
tf_rename_lower.patch
Name the TRUE/FALSE-constants consistently as lowercase everywhere. This might also be an tiny optimization in some environments.
darcs-hash:20070103205700-d2a3e-e7ec0aedb938d563f583116a2d5b17f3a3fea36c.gz
show more ...
|
| #
d5b23302 |
| 17-Nov-2006 |
Tom N Harris <tnharris@whoopdedo.org> |
Indexer asian language fixes and speed-ups
Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing fast
Indexer asian language fixes and speed-ups
Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing faster. The indexes will expire on backend upgrades, so you don't have to delete *.indexed
darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz
show more ...
|
| #
f5e334de |
| 28-Oct-2006 |
Andreas Gohr <andi@splitbrain.org> |
do not transliterate cyrillic soft sign #958
darcs-hash:20061028113426-7ad00-f1d6b3b919c3aadd2bd7585fb772071b81b4b42d.gz
|
| #
2626ee0c |
| 28-Sep-2006 |
chris <chris@jalakai.co.uk> |
more utf8_substr improvements (re FS#891 and yesterday's patch)
- rework utf8_substr() NOMBSTRING code to always use pcre - remove work around for utf8_substr() and large strings from ft_snippet()
more utf8_substr improvements (re FS#891 and yesterday's patch)
- rework utf8_substr() NOMBSTRING code to always use pcre - remove work around for utf8_substr() and large strings from ft_snippet()
darcs-hash:20060928165122-9b6ab-0eefc216f07f9d7e7d8eb62ce26605c28ee340fa.gz
show more ...
|
| #
5e613a5c |
| 27-Sep-2006 |
chris <chris@jalakai.co.uk> |
utf8_substr fix for FS#891
darcs-hash:20060927033713-9b6ab-4b35e0a85b6d11d5a3a98858cd2f860b383ff153.gz
|
| #
720307d9 |
| 23-Sep-2006 |
chris <chris@jalakai.co.uk> |
utf8_stripspecials optimization
Add preconverted utf-8 string of special characters.
The (once only) conversion of the special character unicode array into utf-8 occurs on every DokuWiki page view
utf8_stripspecials optimization
Add preconverted utf-8 string of special characters.
The (once only) conversion of the special character unicode array into utf-8 occurs on every DokuWiki page view, irrespective of action or caching, and takes about one third of the time involved in delivering a wiki page straight from cache.
The original unicode array has been left in place in the file to make any future amendments easier.
darcs-hash:20060923151937-9b6ab-cae0340a95d9596415ef71d7b7e67ef9daca84ef.gz
show more ...
|
| #
9ee93076 |
| 31-Aug-2006 |
chris <chris@jalakai.co.uk> |
search improvements
ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts -
search improvements
ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts - minor improvements
idx_lookup() - minor changes to wildcard matching code to improve performance (changes based on profiling results)
utf8 - specifically set mb_internal_coding to utf-8 when mb_string functions will be used.
darcs-hash:20060831003413-9b6ab-712021eda3c959ffe79d8d3fe91d2c9a8acf2b58.gz
show more ...
|
| #
19a32233 |
| 29-Aug-2006 |
chris <chris@jalakai.co.uk> |
further update to global memory cache arrays
- remove initialisation of caches in inc/pageutils.php - add global declaration to init.php to support init.php being included from within a function,
further update to global memory cache arrays
- remove initialisation of caches in inc/pageutils.php - add global declaration to init.php to support init.php being included from within a function, e.g. unit testing
;-)
- minor change to utf8_substr, remove non-essential brackets added as part of an earlier patch
darcs-hash:20060829134806-9b6ab-ab15191344a83be664c412403dc84a24fa2253a2.gz
show more ...
|
| #
bb4e0b0b |
| 28-Aug-2006 |
chris <chris@jalakai.co.uk> |
utf8_substr() fix, it wasn't using mb_substr results when available
darcs-hash:20060828092029-9b6ab-f76c94b76ce1ada49e2fefde11af824bb98b99c7.gz
|
| #
f50163d1 |
| 27-Aug-2006 |
chris <chris@jalakai.co.uk> |
utf8_correctIdx bounds checking and more unittests
darcs-hash:20060827153254-9b6ab-3c76fde7cb5534ca12628e9aa6e6d59d9bb02f45.gz
|
| #
5953e889 |
| 26-Aug-2006 |
chris <chris@jalakai.co.uk> |
ft_snippet() update, fix utf8 problems
darcs-hash:20060826095311-9b6ab-9a6f272cc7c7532eb2bad8f7b4404c5a16b71109.gz
|
| #
0eac1afb |
| 26-Aug-2006 |
Andreas Gohr <andi@splitbrain.org> |
code to remove bad UTF-8 bytes added
This adds code to remove or replace invalid UTF-8 bytes and uses it in the ft_snippets function.
darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631f
code to remove bad UTF-8 bytes added
This adds code to remove or replace invalid UTF-8 bytes and uses it in the ft_snippets function.
darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631ff467233cd.gz
show more ...
|
| #
74c0c504 |
| 09-Aug-2006 |
chris <chris@jalakai.co.uk> |
cleanID unit tests
+ fix missing utf8 deaccent character mapping + set utf-8 charset for HTMLReporter (unit tests)
darcs-hash:20060809160209-9b6ab-26c80a4830643b9790536f6d3a4adee0f451e4f0.gz
|
| #
54662a04 |
| 11-Jun-2006 |
Andreas Gohr <andi@splitbrain.org> |
make sure UTF8 lookup tables are always global
darcs-hash:20060611173240-7ad00-9bc775163fd9fc65917ffe10f78f872a302bdbcf.gz
|
| #
eaa525a0 |
| 08-Jun-2006 |
Andreas Gohr <andi@splitbrain.org> |
fix for utf8_strpos #827
darcs-hash:20060608200438-7ad00-05fbb18c15df64725ca4ef1ffdc0aa817a508ea4.gz
|
| #
ab77016b |
| 03-Apr-2006 |
Andreas Gohr <andi@splitbrain.org> |
more efficient mb_string checking in utf8.php
darcs-hash:20060403194930-7ad00-499940017f74cfe297f2aa4e65d441243f8572a1.gz
|
| #
10f09f2a |
| 03-Apr-2006 |
Andreas Gohr <andi@splitbrain.org> |
better utf8_substr function
darcs-hash:20060403192537-7ad00-72b129ce494066bce491821a0396db7576873ec2.gz
|
| #
d8cb2602 |
| 03-Mar-2006 |
Denis Simakov <akinoame1@gmail.com> |
nicer russian romanization
darcs-hash:20060303032557-3c565-36015a29e83f000f0a23d8ea039c954766c1223e.gz
|
| #
3dbad6dc |
| 03-Mar-2006 |
Denis Simakov <akinoame1@gmail.com> |
hebrew romanization fix
darcs-hash:20060303031656-3c565-2458122a2481ea3acfbf772e4faae883808cbf71.gz
|
| #
1abfaba4 |
| 21-Feb-2006 |
Andreas Gohr <andi@splitbrain.org> |
fixes for utf-8 to/from unicode conversion
The functions utf8_to unicode and unicode_to_utf8 didn't work correctly with some 3 and 4 byte strings. This exchanges those functions against two more sop
fixes for utf-8 to/from unicode conversion
The functions utf8_to unicode and unicode_to_utf8 didn't work correctly with some 3 and 4 byte strings. This exchanges those functions against two more sophisticated ones. It also adds unit testing for them.
darcs-hash:20060221212605-7ad00-7bfefe8c9615d5a7f3b33c279ce79d4200d4778c.gz
show more ...
|
| #
98c86858 |
| 17-Feb-2006 |
Andreas Gohr <andi@splitbrain.org> |
file cleanups
This patch cleans up the source code to satisfy the coding guidelines (see http://wiki.splitbrain.org/wiki:development#coding_style)
It converts files to UNIX lineendings and removes
file cleanups
This patch cleans up the source code to satisfy the coding guidelines (see http://wiki.splitbrain.org/wiki:development#coding_style)
It converts files to UNIX lineendings and removes tabs and trailing whitespace. Not all files were cleaned yet.
darcs-hash:20060217222040-7ad00-bba3d2bee3b5aa7cbb5184258abd50805cd071bf.gz
show more ...
|
| #
8a831f2b |
| 10-Feb-2006 |
Andreas Gohr <andi@splitbrain.org> |
romanization support in utf8 library
This patch addes basic romanization support to the utf-8 library. It converts non-latin languages to ASCII.
The transliteration tables used where gathered from
romanization support in utf8 library
This patch addes basic romanization support to the utf-8 library. It converts non-latin languages to ASCII.
The transliteration tables used where gathered from various places on the net. I do not speak any of those languages so I can't say how good they are. Any recommendations and fixes are welcome!
This can be enabled for ID cleaning by setting the deaccent option to 2. It is also used in the XHTML renderer to generate section ids based on the header titles. Leading digits and any remaining non-ASCII chars are removed as well. This is the first step to make section ID always XHTML compatible. Making sure they are unique is not implemented yet.
darcs-hash:20060210200627-7ad00-61a633563bb92a00ef4a3f699d73117139cbf367.gz
show more ...
|