History log of /dokuwiki/inc/Parsing/ParserMode/ (Results 1 – 25 of 52)
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
4b31eadf04-Jun-2026 Andreas Gohr <gohr@cosmocode.de>

fix (parsing): avoid newline loss on GFM section editing

The GFM header parsing returned a byte position pointing at the newline
before the actual header resulting in the observed newline eatings as

fix (parsing): avoid newline loss on GFM section editing

The GFM header parsing returned a byte position pointing at the newline
before the actual header resulting in the observed newline eatings as
reported in https://github.com/dokuwiki/dokuwiki/pull/4636#issuecomment-4491970909

Additionally this fixes an oddity of DW header parsing which
accidentally allowed text on the line before the opening = chars.
Whitespace is still allowed.

show more ...

47a02a1004-Jun-2026 Andreas Gohr <gohr@cosmocode.de>

Parsing: make parse syntax a per-parse value, drop ModeInterface

The active parse's syntax flavour is a per-parse question, not process-
global state: within a single request a plugin can render bun

Parsing: make parse syntax a per-parse value, drop ModeInterface

The active parse's syntax flavour is a per-parse question, not process-
global state: within a single request a plugin can render bundled
DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry
was a singleton that read $conf['syntax'] and the $PARSER_MODES global,
and every mode reached it through ModeRegistry::getInstance() — so the
flavour lived in shared mutable state that two parses in one request
would fight over.

Make the registry a short-lived value instead:

- ModeRegistry is constructed once per parse with an explicit $syntax
and injected into Parser, Handler and every mode. getSyntax() /
isDwPreferred() / isMdPreferred() consult $this->syntax; the
DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is
fresh, nothing to invalidate).
- p_get_instructions() is now the single place in the pipeline where
$conf['syntax'] is read; from there the flavour travels as a
parameter. No code under inc/Parsing/ reads $conf['syntax'] directly
anymore — the five syntax-reading modes (Preformatted, GfmHr,
GfmEscape, Externallink, GfmQuote) route through $this->registry.

Keep the two concepts apart, as documented in the ModeRegistry and
AbstractMode docblocks: the user's configured *preference* stays in
$conf['syntax'] for UI code (toolbar, settings), while the active
parse's syntax is a parameter carried by the registry.

$PARSER_MODES is demoted to a deprecated, read-only mirror, published
during loadPluginModes() — third-party syntax plugins (columnlist,
alphalist2, phpwikify, skipentity) and the bundled info plugin read the
global directly, often from their constructors, so the taxonomy must
stay visible there. No core code reads the mirror.

Fold ModeInterface into AbstractMode while here: getSort()/handle() are
abstract, the connect callbacks carry defaults, and the public $Lexer
"FIXME should be done by setter" becomes setLexer()/getLexer() injected
by Parser::addMode() alongside the registry. Nested-content resolution
moves to the allowedCategories()/filterAllowedModes() hooks, resolved
once when the registry is attached.

Tests build their own parser/registry through ParserTestBase::setSyntax()
instead of mutating $conf and calling the removed ModeRegistry::reset().

show more ...


/dokuwiki/_test/bootstrap.php
/dokuwiki/_test/tests/Parsing/HandlerTest.php
/dokuwiki/_test/tests/Parsing/Lexer/RecordingHandler.php
/dokuwiki/_test/tests/Parsing/Markdown/GfmSpecTest.php
/dokuwiki/_test/tests/Parsing/ModeRegistryTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/ExternallinkTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmBacktickDoubleTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmBacktickSingleTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmCodeTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmDeletedTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisStrongTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisStrongUnderscoreTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisUnderscoreTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEscapeTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmFileTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmHeaderTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmHrTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmLinebreakTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmLinkTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmListblockTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmMediaTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmQuoteTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmStrongUnderscoreTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/ParserTestBase.php
/dokuwiki/_test/tests/Parsing/ParserMode/PreformattedTest.php
/dokuwiki/inc/Extension/SyntaxPlugin.php
/dokuwiki/inc/Parsing/Handler.php
/dokuwiki/inc/Parsing/ModeRegistry.php
/dokuwiki/inc/Parsing/Parser.php
AbstractFormatting.php
AbstractMode.php
Base.php
Eol.php
Externallink.php
Footnote.php
GfmBacktickSingle.php
GfmCode.php
GfmEscape.php
GfmHr.php
GfmHtmlEntity.php
GfmListblock.php
GfmQuote.php
GfmTable.php
Listblock.php
Preformatted.php
Table.php
/dokuwiki/inc/deprecated.php
/dokuwiki/inc/parserutils.php
4f32c45b26-May-2026 Andreas Gohr <gohr@cosmocode.de>

GfmLink: allow soft line break inside link text

The label character class explicitly forbade `\n`, so a CommonMark
soft line break inside link text (e.g. `[link with<EOL>more](url)`)
fell through to

GfmLink: allow soft line break inside link text

The label character class explicitly forbade `\n`, so a CommonMark
soft line break inside link text (e.g. `[link with<EOL>more](url)`)
fell through to literal text instead of producing a link. Loosen the
class to accept a bare `\n` as long as it is not followed by a blank
line — soft breaks are spec-allowed inside link text, blank lines are
not, and refusing them also keeps `\n#`-anchored block modes (header,
hr, ...) from being swallowed by a runaway link match.

The `\n` survives into the label string and renders as a literal line
ending in HTML, which browsers display as a single space. This soft
break behavior has been checked against
https://spec.commonmark.org/dingus/

Note that this behavior differs from github where the line break is
rendered as a hard break <br>.

show more ...

65dd204226-May-2026 Andreas Gohr <gohr@cosmocode.de>

GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes

Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the
two backslashes of `\\` followed by space/tab/newline. The lexer's
ti

GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes

Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the
two backslashes of `\\` followed by space/tab/newline. The lexer's
tie-breaker picked GfmEscape, so DW's forced linebreak silently lost
its delimiter under dw+md and md+dw. Add a negative lookahead that
declines `\\[ \t\n]` whenever DW syntax is loaded — pure md keeps
GFM-spec behavior. Mid-line `\\` (UNC paths etc.) still escapes.

show more ...

e7dae73b12-May-2026 Andreas Gohr <andi@splitbrain.org>

fix: apply rector and code sniffer fixes

d331a83912-May-2026 Andreas Gohr <andi@splitbrain.org>

GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename

Constant was renamed on master (the typo'd 'substition' value is kept,
but the constant name spells it correctly). Update GfmTabl

GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename

Constant was renamed on master (the typo'd 'substition' value is kept,
but the constant name spells it correctly). Update GfmTable's use of
the constant, plus stale docblock/comment references in GfmEscape,
GfmHtmlEntity, GfmLinebreak, and GfmLinebreakTest.

show more ...

15429f0212-May-2026 Andreas Gohr <andi@splitbrain.org>

Externallink: GFM autolink extension - parens and entity-ref tail

In Markdown-preferred mode, allow `(` and `)` inside URL char classes
and consume an optional trailing entity reference via the shar

Externallink: GFM autolink extension - parens and entity-ref tail

In Markdown-preferred mode, allow `(` and `)` inside URL char classes
and consume an optional trailing entity reference via the shared
HtmlEntity::PATTERN. The Markdown-only post-processing peels off
mismatched closing parens and decodes the trailing entity reference,
emitting the peeled chars as cdata after the link. Refactors handle()
to dispatch to handleAngleAutolink() and handleBareUrl(), with the new
trim logic in peelGfmTail() and the protocol-prefix step in
addProtocolPrefix(). DW-only mode behavior is unchanged.

Brings GFM spec examples #624, #625, #626 to passing.

show more ...

73dc0a8906-May-2026 Andreas Gohr <andi@splitbrain.org>

fix(mail): keep '&' intact in mailto links with multiple query params

Move the email-handling helpers (obfuscate, mail_isvalid,
mail_quotedprintable_encode, mail_setup) out of the procedural
inc/mai

fix(mail): keep '&' intact in mailto links with multiple query params

Move the email-handling helpers (obfuscate, mail_isvalid,
mail_quotedprintable_encode, mail_setup) out of the procedural
inc/mail.php into a namespaced dokuwiki\MailUtils class plus a new
Mailer::configInit(), and add a separate MailUtils::obfuscateUrl() for
the mailto-href context.

The xhtml renderer and PluginTrait now build the link label and the
href separately: the address half is run through the mailguard
obfuscation, the query string is preserved verbatim with only HTML
escaping applied. This fixes #1690 — in 'visible' mode the previous
code rawurlencoded the entire address+query, turning '?' into '%3F' and
breaking multi-parameter mailto links; in all modes the query string is
no longer mangled by the [at]/[dot] substitution.

Core call sites (Mailer, auth, LegacyApiCore, common, the xhtml
renderer, the parser, the bundled config/styling/usermanager plugins)
are migrated to MailUtils directly. The old top-level functions and
PREG_PATTERN_VALID_EMAIL constant remain as deprecated shims with
rector mappings.

Tests for obfuscate / mail_isvalid / mail_quotedprintable_encode are
consolidated into a single _test/tests/MailUtilsTest.php and extended
with regression coverage for the multi-parameter, double-escape and
URL-shape cases.

Closes #1690
Replaces #1964

show more ...


/dokuwiki/_test/core/DokuWikiTest.php
/dokuwiki/_test/data/media/wiki/exif-orient-6.jpg
/dokuwiki/_test/rector.php
/dokuwiki/_test/tests/Feed/FeedCreatorOptionsTest.php
/dokuwiki/_test/tests/File/MediaFileTest.php
/dokuwiki/_test/tests/MailUtilsTest.php
/dokuwiki/_test/tests/Remote/ApiCoreTest.php
/dokuwiki/_test/tests/Search/BacklinksTest.php
/dokuwiki/_test/tests/Search/Collection/CollectionSearchTest.php
/dokuwiki/_test/tests/Search/Collection/DirectCollectionTest.php
/dokuwiki/_test/tests/Search/Collection/FrequencyCollectionTest.php
/dokuwiki/_test/tests/Search/Collection/LookupCollectionTest.php
/dokuwiki/_test/tests/Search/Collection/MockDirectCollection.php
/dokuwiki/_test/tests/Search/Collection/MockFrequencyCollection.php
/dokuwiki/_test/tests/Search/Collection/MockLookupCollection.php
/dokuwiki/_test/tests/Search/Collection/TermTest.php
/dokuwiki/_test/tests/Search/Index/AbstractIndexTestCase.php
/dokuwiki/_test/tests/Search/Index/FileIndexTest.php
/dokuwiki/_test/tests/Search/Index/LockTest.php
/dokuwiki/_test/tests/Search/Index/MemoryIndexTest.php
/dokuwiki/_test/tests/Search/Index/TupleOpsTest.php
/dokuwiki/_test/tests/Search/IndexerTest.php
/dokuwiki/_test/tests/Search/IntegrityTest.php
/dokuwiki/_test/tests/Search/MediauseTest.php
/dokuwiki/_test/tests/Search/MetadataSearchTest.php
/dokuwiki/_test/tests/Search/Query/NamespacePredicateTest.php
/dokuwiki/_test/tests/Search/Query/PageSetTest.php
/dokuwiki/_test/tests/Search/Query/QueryEvaluatorTest.php
/dokuwiki/_test/tests/Search/Query/QueryParserTest.php
/dokuwiki/_test/tests/Search/data/searchtest.txt
/dokuwiki/_test/tests/Ui/Media/DisplayTest.php
/dokuwiki/_test/tests/inc/IpTest.php
/dokuwiki/_test/tests/inc/changelog_getrelativerevision.test.php
/dokuwiki/_test/tests/inc/common_clientip.test.php
/dokuwiki/_test/tests/inc/common_saveWikiText.test.php
/dokuwiki/_test/tests/lib/exe/fetch_imagetoken.test.php
/dokuwiki/bin/indexer.php
/dokuwiki/composer.json
/dokuwiki/composer.lock
/dokuwiki/conf/dokuwiki.php
/dokuwiki/data/deleted.files
/dokuwiki/feed.php
/dokuwiki/inc/Action/Preview.php
/dokuwiki/inc/Action/Search.php
/dokuwiki/inc/Ajax.php
/dokuwiki/inc/ChangeLog/ChangeLog.php
/dokuwiki/inc/ChangeLog/MediaChangeLog.php
/dokuwiki/inc/ChangeLog/PageChangeLog.php
/dokuwiki/inc/Extension/PluginTrait.php
/dokuwiki/inc/Feed/FeedCreator.php
/dokuwiki/inc/Feed/FeedCreatorOptions.php
/dokuwiki/inc/File/MediaFile.php
/dokuwiki/inc/File/PageFile.php
/dokuwiki/inc/Ip.php
/dokuwiki/inc/MailUtils.php
/dokuwiki/inc/Mailer.class.php
/dokuwiki/inc/Parsing/Handler.php
Emaillink.php
Internallink.php
/dokuwiki/inc/Remote/ApiCore.php
/dokuwiki/inc/Remote/LegacyApiCore.php
/dokuwiki/inc/Search/Collection/AbstractCollection.php
/dokuwiki/inc/Search/Collection/CollectionSearch.php
/dokuwiki/inc/Search/Collection/DirectCollection.php
/dokuwiki/inc/Search/Collection/FrequencyCollection.php
/dokuwiki/inc/Search/Collection/LookupCollection.php
/dokuwiki/inc/Search/Collection/PageFulltextCollection.php
/dokuwiki/inc/Search/Collection/PageMetaCollection.php
/dokuwiki/inc/Search/Collection/PageTitleCollection.php
/dokuwiki/inc/Search/Collection/Term.php
/dokuwiki/inc/Search/Exception/IndexAccessException.php
/dokuwiki/inc/Search/Exception/IndexIntegrityException.php
/dokuwiki/inc/Search/Exception/IndexLockException.php
/dokuwiki/inc/Search/Exception/IndexUsageException.php
/dokuwiki/inc/Search/Exception/IndexWriteException.php
/dokuwiki/inc/Search/Exception/SearchException.php
/dokuwiki/inc/Search/FulltextSearch.php
/dokuwiki/inc/Search/Index/AbstractIndex.php
/dokuwiki/inc/Search/Index/FileIndex.php
/dokuwiki/inc/Search/Index/Lock.php
/dokuwiki/inc/Search/Index/MemoryIndex.php
/dokuwiki/inc/Search/Index/TupleOps.php
/dokuwiki/inc/Search/Indexer.php
/dokuwiki/inc/Search/MetadataSearch.php
/dokuwiki/inc/Search/Query/NamespacePredicate.php
/dokuwiki/inc/Search/Query/NegatedEntry.php
/dokuwiki/inc/Search/Query/PageSet.php
/dokuwiki/inc/Search/Query/QueryEvaluator.php
/dokuwiki/inc/Search/Query/QueryParser.php
/dokuwiki/inc/Search/Query/StackEntry.php
/dokuwiki/inc/Search/Tokenizer.php
/dokuwiki/inc/Search/concept.txt
/dokuwiki/inc/Sitemap/Mapper.php
/dokuwiki/inc/Subscriptions/BulkSubscriptionSender.php
/dokuwiki/inc/TaskRunner.php
/dokuwiki/inc/Ui/Backlinks.php
/dokuwiki/inc/Ui/Media/Display.php
/dokuwiki/inc/Ui/MediaDiff.php
/dokuwiki/inc/Ui/Search.php
/dokuwiki/inc/Ui/SearchState.php
/dokuwiki/inc/auth.php
/dokuwiki/inc/common.php
/dokuwiki/inc/deprecated.php
/dokuwiki/inc/html.php
/dokuwiki/inc/infoutils.php
/dokuwiki/inc/init.php
/dokuwiki/inc/lang/fr/lang.php
/dokuwiki/inc/lang/pl/lang.php
/dokuwiki/inc/lang/tr/lang.php
/dokuwiki/inc/lang/tr/onceexisted.txt
/dokuwiki/inc/load.php
/dokuwiki/inc/media.php
/dokuwiki/inc/parser/xhtml.php
/dokuwiki/inc/search.php
/dokuwiki/inc/template.php
/dokuwiki/lib/exe/fetch.php
/dokuwiki/lib/plugins/authldap/lang/tr/settings.php
/dokuwiki/lib/plugins/authpdo/lang/tr/lang.php
/dokuwiki/lib/plugins/authplain/lang/tr/lang.php
/dokuwiki/lib/plugins/config/core/Setting/SettingEmail.php
/dokuwiki/lib/plugins/config/lang/cs/lang.php
/dokuwiki/lib/plugins/config/lang/de/lang.php
/dokuwiki/lib/plugins/config/lang/en/lang.php
/dokuwiki/lib/plugins/config/lang/es/lang.php
/dokuwiki/lib/plugins/config/lang/fr/lang.php
/dokuwiki/lib/plugins/config/lang/hu/lang.php
/dokuwiki/lib/plugins/config/lang/it/lang.php
/dokuwiki/lib/plugins/config/lang/pl/lang.php
/dokuwiki/lib/plugins/config/lang/pt/lang.php
/dokuwiki/lib/plugins/config/lang/tr/lang.php
/dokuwiki/lib/plugins/config/settings/config.metadata.php
/dokuwiki/lib/plugins/extension/lang/tr/lang.php
/dokuwiki/lib/plugins/info/syntax.php
/dokuwiki/lib/plugins/styling/_test/general.test.php
/dokuwiki/lib/plugins/usermanager/admin.php
/dokuwiki/lib/plugins/usermanager/remote.php
/dokuwiki/lib/scripts/page.js
/dokuwiki/lib/scripts/toolbar.js
/dokuwiki/lib/tpl/dokuwiki/detail.php
/dokuwiki/vendor/composer/installed.json
/dokuwiki/vendor/composer/installed.php
/dokuwiki/vendor/phpseclib/phpseclib/phpseclib/Crypt/Common/Formats/Keys/OpenSSH.php
/dokuwiki/vendor/phpseclib/phpseclib/phpseclib/Crypt/RSA.php
/dokuwiki/vendor/phpseclib/phpseclib/phpseclib/File/ASN1.php
/dokuwiki/vendor/splitbrain/slika/README.md
/dokuwiki/vendor/splitbrain/slika/src/GdAdapter.php
/dokuwiki/vendor/splitbrain/slika/src/ImageInfo.php
56c730b506-May-2026 Andreas Gohr <andi@splitbrain.org>

keep historic typo in value but not in constant

We need to keep the historic typo in the value ("substition"), but there
is no reason to keep it in the constant.

0f69437605-May-2026 Andreas Gohr <gohr@cosmocode.de>

GfmLink: accept escaped brackets inside link labels

The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and
left labels with escaped brackets unmatched. Promote it to
`(?:\\.|[^\[\]\n])+` —

GfmLink: accept escaped brackets inside link labels

The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and
left labels with escaped brackets unmatched. Promote it to
`(?:\\.|[^\[\]\n])+` — the same backslash-escape trick the URL
slot already uses — so spec example 523 (`[link \[bar](/uri)`)
matches and unescapes cleanly. The image-as-label sub-pattern
gets the same upgrade.

handle() needs no change: the new class still rejects bare `]`,
so the first literal `](` in the match is still the separator;
Escape::unescapeBackslashes() was already collapsing `\[` to `[`
before the label reached the link handler.

Adds two GfmLinkTest cases for the `\[` / `\]` forms.

show more ...

dccbd51405-May-2026 Andreas Gohr <andi@splitbrain.org>

GfmQuote: accept ^> line starts so quotes can follow tables and lists

GfmTable, DW Table, and DW Listblock all consume the boundary \n on
their way out. A pure-lookahead exit pattern at that boundar

GfmQuote: accept ^> line starts so quotes can follow tables and lists

GfmTable, DW Table, and DW Listblock all consume the boundary \n on
their way out. A pure-lookahead exit pattern at that boundary would
trip the lexer's no-advance safety check, because tables and lists
exit right after consuming a marker token and have no leading
unmatched content for the lookahead to attach to (unlike Preformatted,
whose body leaves code lines as UNMATCHED right before the boundary).

Fix this on the consumer side: change the first-line anchor from \n>
to (?:^|\n)>. With the lexer's m flag, ^ matches at offset 0 and at
any position immediately following a \n in the subject, including the
position right after a \n that a preceding mode just consumed.
Subsequent quote lines keep the \n> anchor.

Adds three handoff tests in GfmQuoteTest covering GfmTable, DW Table,
and DW Listblock. Resolves GFM spec example 201.

show more ...

f9d3b7bd05-May-2026 Andreas Gohr <andi@splitbrain.org>

Externallink: add per-scheme angle-bracket autolinks for MD syntax

Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to
md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme

Externallink: add per-scheme angle-bracket autolinks for MD syntax

Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to
md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
patterns share the existing conf/scheme.conf allow-list so unknown
schemes fall through to literal cdata instead of being silently
dropped by the renderer. Internal whitespace inside the brackets
disqualifies the autolink and the whole envelope is emitted as
cdata to keep the bare-URL detector off the URL.

LinksTest gains 5 cases covering success, internal-whitespace and
leading-whitespace disqualification, unregistered scheme fallthrough,
and the dw-only no-op path. SpecCompatRenderer URL encoder is updated
to match cmark-gfm's HREF_SAFE table (square brackets and a few other
characters move from safe to encoded). skip.php loses the obsolete
#356 entry and gains #605/#606/#607/#609 explaining the unregistered-
scheme cases that the per-scheme regex naturally rejects.

show more ...

f57da51c05-May-2026 Andreas Gohr <gohr@cosmocode.de>

Preformatted: leave boundary \n in stream when next line has content

Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing
consuming \n exit. When an indented code block is followed

Preformatted: leave boundary \n in stream when next line has content

Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing
consuming \n exit. When an indented code block is followed by a non-
blank line, the boundary newline now stays available for downstream
block-level matchers (GfmHr, GfmHeader, etc.) instead of being eaten
on the way out of preformatted mode.

Concretely fixes a thematic-break-after-indented-code case (GFM spec
case 85's trailing ----): without this change, GfmHr's \n anchor failed
because preformatted had already consumed the newline, and the bare
---- fell through to Entity which converted --- to an em-dash.

The consuming branch is kept as a fall-through for the blank-line and
end-of-input cases, where a pure lookahead would trip the lexer's
no-advance safety check.

Six PreformattedTest expectations updated: trailing cdata after a
preformatted block now carries the leading \n (rendered output is
unchanged — paragraph whitespace is trimmed).

show more ...

eb15e63404-May-2026 Andreas Gohr <andi@splitbrain.org>

extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot

Numeric and named HTML entity decoding moves out of GfmHtmlEntity into
a pure helper, so capture-by-regex modes can apply the same

extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot

Numeric and named HTML entity decoding moves out of GfmHtmlEntity into
a pure helper, so capture-by-regex modes can apply the same decode
post-extraction (the inline lexer never reaches their bodies). Mirrors
the Helpers\Escape pattern.

Wired up in two slots:

- GfmCode info string: f&ouml;&ouml; now decodes to föö in the
language class. Clears spec example #330.

- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern
extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no
longer terminates the URL early; the existing post-classify
Escape::unescapeBackslashes call strips the backslashes after
Link::classify has done its work. Clears #504, #506, #508.

Skip #328 with a self-contained title-slot reason: the URL side now
decodes correctly, but the title attribute is still discarded
(DokuWiki link instructions have no title slot).

show more ...

d208586604-May-2026 Andreas Gohr <andi@splitbrain.org>

extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity

Numeric refs are still decoded explicitly: PHP's html_entity_decode
returns the input unchanged for U+0000, surrogates, U+10F

extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity

Numeric refs are still decoded explicitly: PHP's html_entity_decode
returns the input unchanged for U+0000, surrogates, U+10FFFF, and
BMP noncharacters where CommonMark requires U+FFFD or the literal
codepoint. Named refs delegate to html_entity_decode with ENT_HTML5,
which carries the full HTML5 named-entity table (including multi-
codepoint decodes like &ngE; -> U+2267 + U+0338).

Unknown names stay literal: the original &xxx; passes through as
cdata and the renderer's &-escaping turns it into &amp;xxx;.

show more ...

150dc5f204-May-2026 Andreas Gohr <andi@splitbrain.org>

add GfmNumericEntity for CommonMark numeric character references

Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex,
1-6 digits) to the corresponding Unicode codepoint, emitted as
plain

add GfmNumericEntity for CommonMark numeric character references

Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex,
1-6 digits) to the corresponding Unicode codepoint, emitted as
plain cdata. Codepoint 0, codepoints above U+10FFFF, and the
surrogate range U+D800..U+DFFF map to U+FFFD per the spec.

Distinct from the typography Entity mode, which is renderer-side
configurable via entities.conf. Numeric refs are not configurable
so decoding happens at parse time and the renderer needs no
changes.

Lexer leftmost-match consumes the run before any structural
pattern, so &#42;foo&#42; renders as literal *foo* and &#42; foo
does not start a list - matching the spec rule that numeric refs
cannot stand in for structural markers.

show more ...

13a62f8104-May-2026 Andreas Gohr <andi@splitbrain.org>

rename syntax flavors 'dokuwiki' / 'markdown' to 'dw' / 'md'

Symmetry with the existing 'dw+md' / 'md+dw' setting values.

c4bcbc2e04-May-2026 Andreas Gohr <andi@splitbrain.org>

add GfmLinebreak for GFM hard line breaks

Two-or-more trailing spaces, or a single backslash, immediately before
a non-final newline render as a `<br/>`. Both delimiter forms share a
single SUBSTITI

add GfmLinebreak for GFM hard line breaks

Two-or-more trailing spaces, or a single backslash, immediately before
a non-final newline render as a `<br/>`. Both delimiter forms share a
single SUBSTITION mode at sort 140, loaded under any MD-active syntax
(markdown, dw+md, md+dw); pure dokuwiki is unaffected.

Reuses the existing `linebreak` handler call and renderer; no new
instructions or renderer changes. SpecCompatRenderer overrides
linebreak() to emit the spec's `<br />` shape. Examples 662, 663
(line break inside a raw HTML tag) are skipped — raw HTML is not
passed through by default.

show more ...

3e6baeff30-Apr-2026 Andreas Gohr <andi@splitbrain.org>

replace DW Hr with unified GfmHr

Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_)
horizontal rules; pattern self-narrows on $conf['syntax']. Always
loaded across all four syntax se

replace DW Hr with unified GfmHr

Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_)
horizontal rules; pattern self-narrows on $conf['syntax']. Always
loaded across all four syntax settings, mirroring the GfmQuote
replacement pattern. Same `hr` handler call so renderers and the
call API are unchanged.

Drops DW's old [ \t]* leading-whitespace tolerance — inert in
practice past 0-1 spaces (Preformatted at sort 20 intercepts
everything ≥ 2 spaces or any tab).

Spec examples 13, 20, 26-28, 224 turn green; 17, 21-24, 29, 30, 31
go to skip.php as deliberate non-implementations (whitespace
tolerance and list-precedence cases).

show more ...

309a085230-Apr-2026 Andreas Gohr <andi@splitbrain.org>

replace DW Quote with unified GfmQuote

GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects
in a single mode. Same quote_open/quote_close handler instructions; a
DW-preferred post-p

replace DW Quote with unified GfmQuote

GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects
in a single mode. Same quote_open/quote_close handler instructions; a
DW-preferred post-pass flattens sub-parsed paragraph wrapping into
linebreak calls so existing pages keep their <br/>-between-lines
rendering. MD-preferred keeps the <p>-wrapped spec shape.

Block content (lists, fenced code, tables) inside `>` quotes now
renders, since the body is sub-parsed. Headers stay excluded
(BASEONLY) — TOC and section-edit anchors don't compose with
<blockquote>, same rationale as GfmListblock.

Convert ModeRegistry's sub-parser cache into an acquire/release pool
to support same-key re-entrancy: a list inside a quote re-enters
gfm_quote during the list-item sub-parse, and the inner call needs
its own parser instance even though the exclusion key matches.
GfmListblock is updated to use the new acquire/release primitives.

show more ...

74031e4628-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmEscape for GFM backslash escapes

Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5
inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable
ASCII punctuation c

add GfmEscape for GFM backslash escapes

Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5
inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable
ASCII punctuation char before competing delimiters can match. The
shared character class lives on Helpers\Escape so the lexer pattern
and the post-hoc unescape stay in lockstep.

Whole-span captures (GfmCode info string, GfmLink label/URL) bypass
the lexer; those modes call Escape::unescapeBackslashes() on the
relevant slot. GfmLink skips the unescape when the URL classifies as
a windowssharelink so the leading \\host survives intact.

GfmTable cells get a separate per-cell `\|` to `|` pass in the
rewriter to honour the tables-extension rule that pipes always
unescape, even inside code spans where standard §6.1 escapes don't
fire.

show more ...

3dabe4e028-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmTable for GFM tables

Implements the GFM pipe-table extension as a CONTAINER mode at sort 55,
one below DW Table at 60. A lookahead-validated entry pattern asserts a
header line plus a `:?-+:?

add GfmTable for GFM tables

Implements the GFM pipe-table extension as a CONTAINER mode at sort 55,
one below DW Table at 60. A lookahead-validated entry pattern asserts a
header line plus a `:?-+:?` delimiter row before consuming any input, so
non-table paragraphs containing pipes flow through unchanged. Cells are
inline-only per spec.

Handler\GfmTable rewrites the flat token stream into the canonical
table_open / tablethead_* / tabletbody_* / table_close sequence, deriving
per-column alignment from the delimiter row, padding short body rows
(spec 202), truncating long ones (spec 204), and falling back to a single
cdata when the column count mismatches (spec 203).

`tabletbody_open` / `tabletbody_close` are emitted for the first time;
they are part of the base renderer API but DW Table never used them.
Added to Block's blockOpen / blockClose lists alongside `tabletfoot_*`
for symmetry. SpecCompatRenderer gains minimal table-element overrides
so spec roundtrip output matches GFM's `<table><thead><tr><th>` shape
without DW's wrapper div, row/col counter classes, or align-as-class.

show more ...

685560eb28-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmListblock for GFM lists

GfmListblock captures an entire list block atomically with one
addSpecialPattern match, then walks the captured text in handle()
grouping lines into items. Each item's

add GfmListblock for GFM lists

GfmListblock captures an entire list block atomically with one
addSpecialPattern match, then walks the captured text in handle()
grouping lines into items. Each item's body is dedented to its
content column and parsed by ModeRegistry::getSubParser() so
block content (paragraphs, fenced code, blockquotes, plugin
blocks) works inside items uniformly. Sub-parsed calls are wrapped
in a Nest call before they reach the outer handler, matching the
Footnote pattern: the main handler's Block rewriter treats nest
as opaque and the renderer base class unwraps it transparently,
so multi-paragraph items don't get double-wrapped in <p>.

Marker syntax: -, *, + (unordered) or 1-9 digits followed by
. or ) (ordered). Indentation is a 2-space-multiple step starting
at 0; depth = (indent / 2) + 1, odd indents round down, tabs become
two spaces. The first ordered item's number drives the start
attribute on <ol> via the listo_open $start parameter.

GfmLists subclasses AbstractListsRewriter with the GFM marker
parser; the state machine on the base class is shared with DW Lists.

GfmListblock loads only when $conf['syntax'] is markdown or md+dw.
Under those settings the DW Listblock is suppressed because the two
list models conflict — DW's mandatory 2-space indent rule vs GFM's
zero-indent top-level rule, and -/*/+ markers shared. Plugins that
relied on Listblock loading under md+dw will see it absent there.

Sub-parser exclusion set: CATEGORY_BASEONLY (no Header inside list
items) and gfm_listblock itself (defensive guard against re-entry
on pathological inputs; nested lists are handled by the outer
pattern, not by re-entry).

Tests cover marker variants, ordered start numbers, nested lists at
two and three levels, inline formatting inside items, marker-
character switches keeping one list, type switches splitting the
list, fenced code inside items, multi-paragraph (loose) items, and
two regressions on blank-line tolerance inside the captured block.
SpecCompatRenderer learns to render the list call sequence, and
spec.txt tests for digit/marker-width/lazy-continuation behavior
that GfmListblock deliberately doesn't implement are documented in
gfm-spec/skip.php with the per-bucket reasons (A-F).

Drops two now-obsolete entries from skip.php (image escapes that
land via earlier GfmLink/GfmMedia work) and inlines the Setext
explanation that previously pointed at SPEC.md. Replaces the
SPEC.md reference in GfmEmphasisTest with the inline reason.

show more ...

96d096f127-Apr-2026 Andreas Gohr <andi@splitbrain.org>

remove getLineStartMarkers registry — sort order already wins

Preformatted's entry pattern carried a `(?![\*\-])` negative
lookahead to defer to list modes on indented bullet lines.
0cecf9d50 (2005,

remove getLineStartMarkers registry — sort order already wins

Preformatted's entry pattern carried a `(?![\*\-])` negative
lookahead to defer to list modes on indented bullet lines.
0cecf9d50 (2005, "new parser added") introduced it hardcoded;
7958e6980 (2026, "decouple hardcoded mode names in Eol and
Preformatted") refactored that hardcoded knowledge into
register/getLineStartMarkers on ModeRegistry so each list mode
owned its marker chars. Both preserved the behavior verbatim;
neither documented why it was needed.

Tracing the lexer, it isn't. ParallelRegex merges all entry
patterns into one PCRE expression; PCRE returns the leftmost
match and breaks ties on expression order. Modes are added in
sort order via ModeRegistry::getModes(), so Listblock (sort 10)
always precedes Preformatted (sort 20) and wins the tie on
" - foo" without any lookahead. The only test that caught a
difference was testPreformattedList, which happened to register
modes in non-canonical order - that was a test bug.

This patch drops the lookahead in Preformatted::connectTo, the
registerLineStartMarkers call in Listblock::preConnect, the
register/getLineStartMarkers methods on ModeRegistry, and the
three registry-API unit tests. testPreformattedList now
registers Listblock before Preformatted.

show more ...

1e28e40623-Apr-2026 Andreas Gohr <andi@splitbrain.org>

split Parsing\Helpers into per-domain Link / Media / Code classes

123