utf8.php - OpenGrok history log for /dokuwiki/inc/utf8.php

Revision	Date	Author	Comments
# 72de9068	19-Jul-2007	Andreas Gohr <andi@splitbrain.org>	several speed improvements in UTF-8 lib darcs-hash:20070719110142-7ad00-1192e190c62637ed68e2c2c0a0b3135abfd6ecb5.gz
# 37242afa	23-Mar-2007	Tom N Harris <tnharris@whoopdedo.org>	Escape Ctrl-Z so darcs stops treating utf8.php as binary. darcs-hash:20070323030243-6942e-836105b95078b213df8497386ae9b0418fcf29be.gz
# 9f9fb0e5	02-Feb-2007	Tom N Harris <tnharris@whoopdedo.org>	Encode/Decode numeric HTML entities correctly. utf8_tohtml handles all codepoints, and the inverse function, utf8_unhtml, is added. darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e1 Encode/Decode numeric HTML entities correctly. utf8_tohtml handles all codepoints, and the inverse function, utf8_unhtml, is added. darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e160d60e6.gz show more ...
# 44881bd0	03-Jan-2007	henning.noren <henning.noren@gmail.com>	tf_rename_lower.patch Name the TRUE/FALSE-constants consistently as lowercase everywhere. This might also be an tiny optimization in some environments. darcs-hash:20070103205700-d2a3e-e7ec0aedb938d tf_rename_lower.patch Name the TRUE/FALSE-constants consistently as lowercase everywhere. This might also be an tiny optimization in some environments. darcs-hash:20070103205700-d2a3e-e7ec0aedb938d563f583116a2d5b17f3a3fea36c.gz show more ...
# d5b23302	17-Nov-2006	Tom N Harris <tnharris@whoopdedo.org>	Indexer asian language fixes and speed-ups Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing fast Indexer asian language fixes and speed-ups Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing faster. The indexes will expire on backend upgrades, so you don't have to delete *.indexed darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz show more ...
# f5e334de	28-Oct-2006	Andreas Gohr <andi@splitbrain.org>	do not transliterate cyrillic soft sign #958 darcs-hash:20061028113426-7ad00-f1d6b3b919c3aadd2bd7585fb772071b81b4b42d.gz
# 2626ee0c	28-Sep-2006	chris <chris@jalakai.co.uk>	more utf8_substr improvements (re FS#891 and yesterday's patch) - rework utf8_substr() NOMBSTRING code to always use pcre - remove work around for utf8_substr() and large strings from ft_snippet() more utf8_substr improvements (re FS#891 and yesterday's patch) - rework utf8_substr() NOMBSTRING code to always use pcre - remove work around for utf8_substr() and large strings from ft_snippet() darcs-hash:20060928165122-9b6ab-0eefc216f07f9d7e7d8eb62ce26605c28ee340fa.gz show more ...
# 5e613a5c	27-Sep-2006	chris <chris@jalakai.co.uk>	utf8_substr fix for FS#891 darcs-hash:20060927033713-9b6ab-4b35e0a85b6d11d5a3a98858cd2f860b383ff153.gz
# 720307d9	23-Sep-2006	chris <chris@jalakai.co.uk>	utf8_stripspecials optimization Add preconverted utf-8 string of special characters. The (once only) conversion of the special character unicode array into utf-8 occurs on every DokuWiki page view utf8_stripspecials optimization Add preconverted utf-8 string of special characters. The (once only) conversion of the special character unicode array into utf-8 occurs on every DokuWiki page view, irrespective of action or caching, and takes about one third of the time involved in delivering a wiki page straight from cache. The original unicode array has been left in place in the file to make any future amendments easier. darcs-hash:20060923151937-9b6ab-cae0340a95d9596415ef71d7b7e67ef9daca84ef.gz show more ...
# 9ee93076	31-Aug-2006	chris <chris@jalakai.co.uk>	search improvements ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts - search improvements ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts - minor improvements idx_lookup() - minor changes to wildcard matching code to improve performance (changes based on profiling results) utf8 - specifically set mb_internal_coding to utf-8 when mb_string functions will be used. darcs-hash:20060831003413-9b6ab-712021eda3c959ffe79d8d3fe91d2c9a8acf2b58.gz show more ...
# 19a32233	29-Aug-2006	chris <chris@jalakai.co.uk>	further update to global memory cache arrays - remove initialisation of caches in inc/pageutils.php - add global declaration to init.php to support init.php being included from within a function, further update to global memory cache arrays - remove initialisation of caches in inc/pageutils.php - add global declaration to init.php to support init.php being included from within a function, e.g. unit testing ;-) - minor change to utf8_substr, remove non-essential brackets added as part of an earlier patch darcs-hash:20060829134806-9b6ab-ab15191344a83be664c412403dc84a24fa2253a2.gz show more ...
# bb4e0b0b	28-Aug-2006	chris <chris@jalakai.co.uk>	utf8_substr() fix, it wasn't using mb_substr results when available darcs-hash:20060828092029-9b6ab-f76c94b76ce1ada49e2fefde11af824bb98b99c7.gz
# f50163d1	27-Aug-2006	chris <chris@jalakai.co.uk>	utf8_correctIdx bounds checking and more unittests darcs-hash:20060827153254-9b6ab-3c76fde7cb5534ca12628e9aa6e6d59d9bb02f45.gz
# 5953e889	26-Aug-2006	chris <chris@jalakai.co.uk>	ft_snippet() update, fix utf8 problems darcs-hash:20060826095311-9b6ab-9a6f272cc7c7532eb2bad8f7b4404c5a16b71109.gz
# 0eac1afb	26-Aug-2006	Andreas Gohr <andi@splitbrain.org>	code to remove bad UTF-8 bytes added This adds code to remove or replace invalid UTF-8 bytes and uses it in the ft_snippets function. darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631f code to remove bad UTF-8 bytes added This adds code to remove or replace invalid UTF-8 bytes and uses it in the ft_snippets function. darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631ff467233cd.gz show more ...
# 74c0c504	09-Aug-2006	chris <chris@jalakai.co.uk>	cleanID unit tests + fix missing utf8 deaccent character mapping + set utf-8 charset for HTMLReporter (unit tests) darcs-hash:20060809160209-9b6ab-26c80a4830643b9790536f6d3a4adee0f451e4f0.gz
# 54662a04	11-Jun-2006	Andreas Gohr <andi@splitbrain.org>	make sure UTF8 lookup tables are always global darcs-hash:20060611173240-7ad00-9bc775163fd9fc65917ffe10f78f872a302bdbcf.gz
# eaa525a0	08-Jun-2006	Andreas Gohr <andi@splitbrain.org>	fix for utf8_strpos #827 darcs-hash:20060608200438-7ad00-05fbb18c15df64725ca4ef1ffdc0aa817a508ea4.gz
# ab77016b	03-Apr-2006	Andreas Gohr <andi@splitbrain.org>	more efficient mb_string checking in utf8.php darcs-hash:20060403194930-7ad00-499940017f74cfe297f2aa4e65d441243f8572a1.gz
# 10f09f2a	03-Apr-2006	Andreas Gohr <andi@splitbrain.org>	better utf8_substr function darcs-hash:20060403192537-7ad00-72b129ce494066bce491821a0396db7576873ec2.gz
# d8cb2602	03-Mar-2006	Denis Simakov <akinoame1@gmail.com>	nicer russian romanization darcs-hash:20060303032557-3c565-36015a29e83f000f0a23d8ea039c954766c1223e.gz
# 3dbad6dc	03-Mar-2006	Denis Simakov <akinoame1@gmail.com>	hebrew romanization fix darcs-hash:20060303031656-3c565-2458122a2481ea3acfbf772e4faae883808cbf71.gz
# 1abfaba4	21-Feb-2006	Andreas Gohr <andi@splitbrain.org>	fixes for utf-8 to/from unicode conversion The functions utf8_to unicode and unicode_to_utf8 didn't work correctly with some 3 and 4 byte strings. This exchanges those functions against two more sop fixes for utf-8 to/from unicode conversion The functions utf8_to unicode and unicode_to_utf8 didn't work correctly with some 3 and 4 byte strings. This exchanges those functions against two more sophisticated ones. It also adds unit testing for them. darcs-hash:20060221212605-7ad00-7bfefe8c9615d5a7f3b33c279ce79d4200d4778c.gz show more ...
# 98c86858	17-Feb-2006	Andreas Gohr <andi@splitbrain.org>	file cleanups This patch cleans up the source code to satisfy the coding guidelines (see http://wiki.splitbrain.org/wiki:development#coding_style) It converts files to UNIX lineendings and removes file cleanups This patch cleans up the source code to satisfy the coding guidelines (see http://wiki.splitbrain.org/wiki:development#coding_style) It converts files to UNIX lineendings and removes tabs and trailing whitespace. Not all files were cleaned yet. darcs-hash:20060217222040-7ad00-bba3d2bee3b5aa7cbb5184258abd50805cd071bf.gz show more ...
# 8a831f2b	10-Feb-2006	Andreas Gohr <andi@splitbrain.org>	romanization support in utf8 library This patch addes basic romanization support to the utf-8 library. It converts non-latin languages to ASCII. The transliteration tables used where gathered from romanization support in utf8 library This patch addes basic romanization support to the utf-8 library. It converts non-latin languages to ASCII. The transliteration tables used where gathered from various places on the net. I do not speak any of those languages so I can't say how good they are. Any recommendations and fixes are welcome! This can be enabled for ID cleaning by setting the deaccent option to 2. It is also used in the XHTML renderer to generate section ids based on the header titles. Leading digits and any remaining non-ASCII chars are removed as well. This is the first step to make section ID always XHTML compatible. Making sure they are unique is not implemented yet. darcs-hash:20060210200627-7ad00-61a633563bb92a00ef4a3f699d73117139cbf367.gz show more ...
1 234 5