History log of /dokuwiki/inc/utf8.php (Results 51 – 75 of 102)
Revision Date Author Comments
# 72de9068 19-Jul-2007 Andreas Gohr <andi@splitbrain.org>

several speed improvements in UTF-8 lib

darcs-hash:20070719110142-7ad00-1192e190c62637ed68e2c2c0a0b3135abfd6ecb5.gz


# 37242afa 23-Mar-2007 Tom N Harris <tnharris@whoopdedo.org>

Escape Ctrl-Z so darcs stops treating utf8.php as binary.

darcs-hash:20070323030243-6942e-836105b95078b213df8497386ae9b0418fcf29be.gz


# 9f9fb0e5 02-Feb-2007 Tom N Harris <tnharris@whoopdedo.org>

Encode/Decode numeric HTML entities correctly.

utf8_tohtml handles all codepoints, and the inverse
function, utf8_unhtml, is added.

darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e1

Encode/Decode numeric HTML entities correctly.

utf8_tohtml handles all codepoints, and the inverse
function, utf8_unhtml, is added.

darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e160d60e6.gz

show more ...


# 44881bd0 03-Jan-2007 henning.noren <henning.noren@gmail.com>

tf_rename_lower.patch

Name the TRUE/FALSE-constants consistently as lowercase everywhere.
This might also be an tiny optimization in some environments.

darcs-hash:20070103205700-d2a3e-e7ec0aedb938d

tf_rename_lower.patch

Name the TRUE/FALSE-constants consistently as lowercase everywhere.
This might also be an tiny optimization in some environments.

darcs-hash:20070103205700-d2a3e-e7ec0aedb938d563f583116a2d5b17f3a3fea36c.gz

show more ...


# d5b23302 17-Nov-2006 Tom N Harris <tnharris@whoopdedo.org>

Indexer asian language fixes and speed-ups

Make Chinese and Japanese work better with the new indexer.
Some missing punctuation added to utf8_stripspecials.
Misc. other changes to make indexing fast

Indexer asian language fixes and speed-ups

Make Chinese and Japanese work better with the new indexer.
Some missing punctuation added to utf8_stripspecials.
Misc. other changes to make indexing faster. The indexes will expire on
backend upgrades, so you don't have to delete *.indexed

darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz

show more ...


# f5e334de 28-Oct-2006 Andreas Gohr <andi@splitbrain.org>

do not transliterate cyrillic soft sign #958

darcs-hash:20061028113426-7ad00-f1d6b3b919c3aadd2bd7585fb772071b81b4b42d.gz


# 2626ee0c 28-Sep-2006 chris <chris@jalakai.co.uk>

more utf8_substr improvements (re FS#891 and yesterday's patch)

- rework utf8_substr() NOMBSTRING code to always use pcre
- remove work around for utf8_substr() and large strings from ft_snippet()

more utf8_substr improvements (re FS#891 and yesterday's patch)

- rework utf8_substr() NOMBSTRING code to always use pcre
- remove work around for utf8_substr() and large strings from ft_snippet()

darcs-hash:20060928165122-9b6ab-0eefc216f07f9d7e7d8eb62ce26605c28ee340fa.gz

show more ...


# 5e613a5c 27-Sep-2006 chris <chris@jalakai.co.uk>

utf8_substr fix for FS#891

darcs-hash:20060927033713-9b6ab-4b35e0a85b6d11d5a3a98858cd2f860b383ff153.gz


# 720307d9 23-Sep-2006 chris <chris@jalakai.co.uk>

utf8_stripspecials optimization

Add preconverted utf-8 string of special characters.

The (once only) conversion of the special character unicode
array into utf-8 occurs on every DokuWiki page view

utf8_stripspecials optimization

Add preconverted utf-8 string of special characters.

The (once only) conversion of the special character unicode
array into utf-8 occurs on every DokuWiki page view,
irrespective of action or caching, and takes about one third
of the time involved in delivering a wiki page straight from
cache.

The original unicode array has been left in place in the file
to make any future amendments easier.

darcs-hash:20060923151937-9b6ab-cae0340a95d9596415ef71d7b7e67ef9daca84ef.gz

show more ...


# 9ee93076 31-Aug-2006 chris <chris@jalakai.co.uk>

search improvements

ft_snippet()
- make utf8 algorithm default
- add workaround for utf8_substr() limitations, bug #891
- fix some indexes which missed out on conversion to utf8
character counts
-

search improvements

ft_snippet()
- make utf8 algorithm default
- add workaround for utf8_substr() limitations, bug #891
- fix some indexes which missed out on conversion to utf8
character counts
- minor improvements

idx_lookup()
- minor changes to wildcard matching code to improve performance
(changes based on profiling results)

utf8
- specifically set mb_internal_coding to utf-8 when mb_string
functions will be used.

darcs-hash:20060831003413-9b6ab-712021eda3c959ffe79d8d3fe91d2c9a8acf2b58.gz

show more ...


# 19a32233 29-Aug-2006 chris <chris@jalakai.co.uk>

further update to global memory cache arrays

- remove initialisation of caches in inc/pageutils.php
- add global declaration to init.php to support init.php
being included from within a function,

further update to global memory cache arrays

- remove initialisation of caches in inc/pageutils.php
- add global declaration to init.php to support init.php
being included from within a function, e.g. unit testing

;-)

- minor change to utf8_substr, remove non-essential brackets
added as part of an earlier patch

darcs-hash:20060829134806-9b6ab-ab15191344a83be664c412403dc84a24fa2253a2.gz

show more ...


# bb4e0b0b 28-Aug-2006 chris <chris@jalakai.co.uk>

utf8_substr() fix, it wasn't using mb_substr results when available

darcs-hash:20060828092029-9b6ab-f76c94b76ce1ada49e2fefde11af824bb98b99c7.gz


# f50163d1 27-Aug-2006 chris <chris@jalakai.co.uk>

utf8_correctIdx bounds checking and more unittests

darcs-hash:20060827153254-9b6ab-3c76fde7cb5534ca12628e9aa6e6d59d9bb02f45.gz


# 5953e889 26-Aug-2006 chris <chris@jalakai.co.uk>

ft_snippet() update, fix utf8 problems

darcs-hash:20060826095311-9b6ab-9a6f272cc7c7532eb2bad8f7b4404c5a16b71109.gz


# 0eac1afb 26-Aug-2006 Andreas Gohr <andi@splitbrain.org>

code to remove bad UTF-8 bytes added

This adds code to remove or replace invalid UTF-8 bytes and uses it
in the ft_snippets function.

darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631f

code to remove bad UTF-8 bytes added

This adds code to remove or replace invalid UTF-8 bytes and uses it
in the ft_snippets function.

darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631ff467233cd.gz

show more ...


# 74c0c504 09-Aug-2006 chris <chris@jalakai.co.uk>

cleanID unit tests

+ fix missing utf8 deaccent character mapping
+ set utf-8 charset for HTMLReporter (unit tests)

darcs-hash:20060809160209-9b6ab-26c80a4830643b9790536f6d3a4adee0f451e4f0.gz


# 54662a04 11-Jun-2006 Andreas Gohr <andi@splitbrain.org>

make sure UTF8 lookup tables are always global

darcs-hash:20060611173240-7ad00-9bc775163fd9fc65917ffe10f78f872a302bdbcf.gz


# eaa525a0 08-Jun-2006 Andreas Gohr <andi@splitbrain.org>

fix for utf8_strpos #827

darcs-hash:20060608200438-7ad00-05fbb18c15df64725ca4ef1ffdc0aa817a508ea4.gz


# ab77016b 03-Apr-2006 Andreas Gohr <andi@splitbrain.org>

more efficient mb_string checking in utf8.php

darcs-hash:20060403194930-7ad00-499940017f74cfe297f2aa4e65d441243f8572a1.gz


# 10f09f2a 03-Apr-2006 Andreas Gohr <andi@splitbrain.org>

better utf8_substr function

darcs-hash:20060403192537-7ad00-72b129ce494066bce491821a0396db7576873ec2.gz


# d8cb2602 03-Mar-2006 Denis Simakov <akinoame1@gmail.com>

nicer russian romanization

darcs-hash:20060303032557-3c565-36015a29e83f000f0a23d8ea039c954766c1223e.gz


# 3dbad6dc 03-Mar-2006 Denis Simakov <akinoame1@gmail.com>

hebrew romanization fix

darcs-hash:20060303031656-3c565-2458122a2481ea3acfbf772e4faae883808cbf71.gz


# 1abfaba4 21-Feb-2006 Andreas Gohr <andi@splitbrain.org>

fixes for utf-8 to/from unicode conversion

The functions utf8_to unicode and unicode_to_utf8 didn't work correctly
with some 3 and 4 byte strings. This exchanges those functions against
two more sop

fixes for utf-8 to/from unicode conversion

The functions utf8_to unicode and unicode_to_utf8 didn't work correctly
with some 3 and 4 byte strings. This exchanges those functions against
two more sophisticated ones. It also adds unit testing for them.

darcs-hash:20060221212605-7ad00-7bfefe8c9615d5a7f3b33c279ce79d4200d4778c.gz

show more ...


# 98c86858 17-Feb-2006 Andreas Gohr <andi@splitbrain.org>

file cleanups

This patch cleans up the source code to satisfy the coding guidelines (see
http://wiki.splitbrain.org/wiki:development#coding_style)

It converts files to UNIX lineendings and removes

file cleanups

This patch cleans up the source code to satisfy the coding guidelines (see
http://wiki.splitbrain.org/wiki:development#coding_style)

It converts files to UNIX lineendings and removes tabs and trailing
whitespace. Not all files were cleaned yet.

darcs-hash:20060217222040-7ad00-bba3d2bee3b5aa7cbb5184258abd50805cd071bf.gz

show more ...


# 8a831f2b 10-Feb-2006 Andreas Gohr <andi@splitbrain.org>

romanization support in utf8 library

This patch addes basic romanization support to the utf-8 library. It
converts non-latin languages to ASCII.

The transliteration tables used where gathered from

romanization support in utf8 library

This patch addes basic romanization support to the utf-8 library. It
converts non-latin languages to ASCII.

The transliteration tables used where gathered from various places
on the net. I do not speak any of those languages so I can't say how
good they are. Any recommendations and fixes are welcome!

This can be enabled for ID cleaning by setting the deaccent option to 2.
It is also used in the XHTML renderer to generate section ids based
on the header titles. Leading digits and any remaining non-ASCII chars
are removed as well. This is the first step to make section ID always
XHTML compatible. Making sure they are unique is not implemented yet.

darcs-hash:20060210200627-7ad00-61a633563bb92a00ef4a3f699d73117139cbf367.gz

show more ...


12345