History log of /dokuwiki/_test/tests/Parsing/ParserMode/GfmHtmlEntityTest.php (Results 1 – 2 of 2)
Revision Date Author Comments
# eb15e634 04-May-2026 Andreas Gohr <andi@splitbrain.org>

extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot

Numeric and named HTML entity decoding moves out of GfmHtmlEntity into
a pure helper, so capture-by-regex modes can apply the same

extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot

Numeric and named HTML entity decoding moves out of GfmHtmlEntity into
a pure helper, so capture-by-regex modes can apply the same decode
post-extraction (the inline lexer never reaches their bodies). Mirrors
the Helpers\Escape pattern.

Wired up in two slots:

- GfmCode info string: f&ouml;&ouml; now decodes to föö in the
language class. Clears spec example #330.

- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern
extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no
longer terminates the URL early; the existing post-classify
Escape::unescapeBackslashes call strips the backslashes after
Link::classify has done its work. Clears #504, #506, #508.

Skip #328 with a self-contained title-slot reason: the URL side now
decodes correctly, but the title attribute is still discarded
(DokuWiki link instructions have no title slot).

show more ...


# d2085866 04-May-2026 Andreas Gohr <andi@splitbrain.org>

extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity

Numeric refs are still decoded explicitly: PHP's html_entity_decode
returns the input unchanged for U+0000, surrogates, U+10F

extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity

Numeric refs are still decoded explicitly: PHP's html_entity_decode
returns the input unchanged for U+0000, surrogates, U+10FFFF, and
BMP noncharacters where CommonMark requires U+FFFD or the literal
codepoint. Named refs delegate to html_entity_decode with ENT_HTML5,
which carries the full HTML5 named-entity table (including multi-
codepoint decodes like &ngE; -> U+2267 + U+0338).

Unknown names stay literal: the original &xxx; passes through as
cdata and the renderer's &-escaping turns it into &amp;xxx;.

show more ...