| a7e10216 | 25-Jun-2026 |
splitbrain <86426+splitbrain@users.noreply.github.com> |
Rector and PHPCS fixes |
| 4b31eadf | 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
fix (parsing): avoid newline loss on GFM section editing
The GFM header parsing returned a byte position pointing at the newline before the actual header resulting in the observed newline eatings as
fix (parsing): avoid newline loss on GFM section editing
The GFM header parsing returned a byte position pointing at the newline before the actual header resulting in the observed newline eatings as reported in https://github.com/dokuwiki/dokuwiki/pull/4636#issuecomment-4491970909
Additionally this fixes an oddity of DW header parsing which accidentally allowed text on the line before the opening = chars. Whitespace is still allowed.
show more ...
|
| 47a02a10 | 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bun
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bundled DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry was a singleton that read $conf['syntax'] and the $PARSER_MODES global, and every mode reached it through ModeRegistry::getInstance() — so the flavour lived in shared mutable state that two parses in one request would fight over.
Make the registry a short-lived value instead:
- ModeRegistry is constructed once per parse with an explicit $syntax and injected into Parser, Handler and every mode. getSyntax() / isDwPreferred() / isMdPreferred() consult $this->syntax; the DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is fresh, nothing to invalidate). - p_get_instructions() is now the single place in the pipeline where $conf['syntax'] is read; from there the flavour travels as a parameter. No code under inc/Parsing/ reads $conf['syntax'] directly anymore — the five syntax-reading modes (Preformatted, GfmHr, GfmEscape, Externallink, GfmQuote) route through $this->registry.
Keep the two concepts apart, as documented in the ModeRegistry and AbstractMode docblocks: the user's configured *preference* stays in $conf['syntax'] for UI code (toolbar, settings), while the active parse's syntax is a parameter carried by the registry.
$PARSER_MODES is demoted to a deprecated, read-only mirror, published during loadPluginModes() — third-party syntax plugins (columnlist, alphalist2, phpwikify, skipentity) and the bundled info plugin read the global directly, often from their constructors, so the taxonomy must stay visible there. No core code reads the mirror.
Fold ModeInterface into AbstractMode while here: getSort()/handle() are abstract, the connect callbacks carry defaults, and the public $Lexer "FIXME should be done by setter" becomes setLexer()/getLexer() injected by Parser::addMode() alongside the registry. Nested-content resolution moves to the allowedCategories()/filterAllowedModes() hooks, resolved once when the registry is attached.
Tests build their own parser/registry through ParserTestBase::setSyntax() instead of mutating $conf and calling the removed ModeRegistry::reset().
show more ...
|
| 4f32c45b | 26-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmLink: allow soft line break inside link text
The label character class explicitly forbade `\n`, so a CommonMark soft line break inside link text (e.g. `[link with<EOL>more](url)`) fell through to
GfmLink: allow soft line break inside link text
The label character class explicitly forbade `\n`, so a CommonMark soft line break inside link text (e.g. `[link with<EOL>more](url)`) fell through to literal text instead of producing a link. Loosen the class to accept a bare `\n` as long as it is not followed by a blank line — soft breaks are spec-allowed inside link text, blank lines are not, and refusing them also keeps `\n#`-anchored block modes (header, hr, ...) from being swallowed by a runaway link match.
The `\n` survives into the label string and renders as a literal line ending in HTML, which browsers display as a single space. This soft break behavior has been checked against https://spec.commonmark.org/dingus/
Note that this behavior differs from github where the line break is rendered as a hard break <br>.
show more ...
|
| 65dd2042 | 26-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes
Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the two backslashes of `\\` followed by space/tab/newline. The lexer's ti
GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes
Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the two backslashes of `\\` followed by space/tab/newline. The lexer's tie-breaker picked GfmEscape, so DW's forced linebreak silently lost its delimiter under dw+md and md+dw. Add a negative lookahead that declines `\\[ \t\n]` whenever DW syntax is loaded — pure md keeps GFM-spec behavior. Mid-line `\\` (UNC paths etc.) still escapes.
show more ...
|
| 7686f203 | 12-May-2026 |
Anna Dabrowska <dabrowska@cosmocode.de> |
Fix syntax plugin rendering
Reverse the order in which core modes and plugin modes are handled by the \dokuwiki\Parsing\Handler. Otherwise only the handle() method of the plugin is called, which is
Fix syntax plugin rendering
Reverse the order in which core modes and plugin modes are handled by the \dokuwiki\Parsing\Handler. Otherwise only the handle() method of the plugin is called, which is fine for core modes. Syntax plugins need to go through plugin() to actually add their own calls.
show more ...
|
| e7dae73b | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
fix: apply rector and code sniffer fixes |
| d331a839 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename
Constant was renamed on master (the typo'd 'substition' value is kept, but the constant name spells it correctly). Update GfmTabl
GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename
Constant was renamed on master (the typo'd 'substition' value is kept, but the constant name spells it correctly). Update GfmTable's use of the constant, plus stale docblock/comment references in GfmEscape, GfmHtmlEntity, GfmLinebreak, and GfmLinebreakTest.
show more ...
|
| 15429f02 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: GFM autolink extension - parens and entity-ref tail
In Markdown-preferred mode, allow `(` and `)` inside URL char classes and consume an optional trailing entity reference via the shar
Externallink: GFM autolink extension - parens and entity-ref tail
In Markdown-preferred mode, allow `(` and `)` inside URL char classes and consume an optional trailing entity reference via the shared HtmlEntity::PATTERN. The Markdown-only post-processing peels off mismatched closing parens and decodes the trailing entity reference, emitting the peeled chars as cdata after the link. Refactors handle() to dispatch to handleAngleAutolink() and handleBareUrl(), with the new trim logic in peelGfmTail() and the protocol-prefix step in addProtocolPrefix(). DW-only mode behavior is unchanged.
Brings GFM spec examples #624, #625, #626 to passing.
show more ...
|
| 73dc0a89 | 06-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
fix(mail): keep '&' intact in mailto links with multiple query params
Move the email-handling helpers (obfuscate, mail_isvalid, mail_quotedprintable_encode, mail_setup) out of the procedural inc/mai
fix(mail): keep '&' intact in mailto links with multiple query params
Move the email-handling helpers (obfuscate, mail_isvalid, mail_quotedprintable_encode, mail_setup) out of the procedural inc/mail.php into a namespaced dokuwiki\MailUtils class plus a new Mailer::configInit(), and add a separate MailUtils::obfuscateUrl() for the mailto-href context.
The xhtml renderer and PluginTrait now build the link label and the href separately: the address half is run through the mailguard obfuscation, the query string is preserved verbatim with only HTML escaping applied. This fixes #1690 — in 'visible' mode the previous code rawurlencoded the entire address+query, turning '?' into '%3F' and breaking multi-parameter mailto links; in all modes the query string is no longer mangled by the [at]/[dot] substitution.
Core call sites (Mailer, auth, LegacyApiCore, common, the xhtml renderer, the parser, the bundled config/styling/usermanager plugins) are migrated to MailUtils directly. The old top-level functions and PREG_PATTERN_VALID_EMAIL constant remain as deprecated shims with rector mappings.
Tests for obfuscate / mail_isvalid / mail_quotedprintable_encode are consolidated into a single _test/tests/MailUtilsTest.php and extended with regression coverage for the multi-parameter, double-escape and URL-shape cases.
Closes #1690 Replaces #1964
show more ...
|
| 8788dbbd | 06-May-2026 |
splitbrain <86426+splitbrain@users.noreply.github.com> |
Rector and PHPCS fixes |
| 56c730b5 | 06-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
keep historic typo in value but not in constant
We need to keep the historic typo in the value ("substition"), but there is no reason to keep it in the constant. |
| 0f694376 | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmLink: accept escaped brackets inside link labels
The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and left labels with escaped brackets unmatched. Promote it to `(?:\\.|[^\[\]\n])+` —
GfmLink: accept escaped brackets inside link labels
The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and left labels with escaped brackets unmatched. Promote it to `(?:\\.|[^\[\]\n])+` — the same backslash-escape trick the URL slot already uses — so spec example 523 (`[link \[bar](/uri)`) matches and unescapes cleanly. The image-as-label sub-pattern gets the same upgrade.
handle() needs no change: the new class still rejects bare `]`, so the first literal `](` in the match is still the separator; Escape::unescapeBackslashes() was already collapsing `\[` to `[` before the label reached the link handler.
Adds two GfmLinkTest cases for the `\[` / `\]` forms.
show more ...
|
| 113171bb | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Preformatted: strip leading/trailing blank-line padding from body
The lexer's `\n ` continuation pattern eats the indent off blank-but-indented source lines, leaving a `\n` in the rewriter buffer
Preformatted: strip leading/trailing blank-line padding from body
The lexer's `\n ` continuation pattern eats the indent off blank-but-indented source lines, leaving a `\n` in the rewriter buffer for each one. Trim those padding newlines before emitting so the preformatted body starts and ends on a non-blank line, as GFM example #87 requires. Whitespace-only blocks are still skipped entirely.
Adds two PreformattedTest cases pinning the new behavior.
show more ...
|
| dccbd514 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmQuote: accept ^> line starts so quotes can follow tables and lists
GfmTable, DW Table, and DW Listblock all consume the boundary \n on their way out. A pure-lookahead exit pattern at that boundar
GfmQuote: accept ^> line starts so quotes can follow tables and lists
GfmTable, DW Table, and DW Listblock all consume the boundary \n on their way out. A pure-lookahead exit pattern at that boundary would trip the lexer's no-advance safety check, because tables and lists exit right after consuming a marker token and have no leading unmatched content for the lookahead to attach to (unlike Preformatted, whose body leaves code lines as UNMATCHED right before the boundary).
Fix this on the consumer side: change the first-line anchor from \n> to (?:^|\n)>. With the lexer's m flag, ^ matches at offset 0 and at any position immediately following a \n in the subject, including the position right after a \n that a preceding mode just consumed. Subsequent quote lines keep the \n> anchor.
Adds three handoff tests in GfmQuoteTest covering GfmTable, DW Table, and DW Listblock. Resolves GFM spec example 201.
show more ...
|
| f9d3b7bd | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme patterns share the existing conf/scheme.conf allow-list so unknown schemes fall through to literal cdata instead of being silently dropped by the renderer. Internal whitespace inside the brackets disqualifies the autolink and the whole envelope is emitted as cdata to keep the bare-URL detector off the URL.
LinksTest gains 5 cases covering success, internal-whitespace and leading-whitespace disqualification, unregistered scheme fallthrough, and the dw-only no-op path. SpecCompatRenderer URL encoder is updated to match cmark-gfm's HREF_SAFE table (square brackets and a few other characters move from safe to encoded). skip.php loses the obsolete #356 entry and gains #605/#606/#607/#609 explaining the unregistered- scheme cases that the per-scheme regex naturally rejects.
show more ...
|
| c2248fda | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Block: collapse [ \t]*\n[ \t]* to \n inside paragraphs
Strip [ \t]+ on either side of the soft-break joiner emitted for a single eol, and ltrim the first cdata of each paragraph. Without this, DokuW
Block: collapse [ \t]*\n[ \t]* to \n inside paragraphs
Strip [ \t]+ on either side of the soft-break joiner emitted for a single eol, and ltrim the first cdata of each paragraph. Without this, DokuWiki preserved leading/trailing whitespace on continuation lines verbatim, which is invisible in HTML but may visible in plain-text and other renderers. It is also a requirement in the Markdown spec.
Re-baseline the parser-mode tests that pinned the old preserve behavior (cdata adjacent to <code>/<file>/<rss>/header/footnote).
show more ...
|
| f57da51c | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Preformatted: leave boundary \n in stream when next line has content
Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing consuming \n exit. When an indented code block is followed
Preformatted: leave boundary \n in stream when next line has content
Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing consuming \n exit. When an indented code block is followed by a non- blank line, the boundary newline now stays available for downstream block-level matchers (GfmHr, GfmHeader, etc.) instead of being eaten on the way out of preformatted mode.
Concretely fixes a thematic-break-after-indented-code case (GFM spec case 85's trailing ----): without this change, GfmHr's \n anchor failed because preformatted had already consumed the newline, and the bare ---- fell through to Entity which converted --- to an em-dash.
The consuming branch is kept as a fall-through for the blank-line and end-of-input cases, where a pure lookahead would trip the lexer's no-advance safety check.
Six PreformattedTest expectations updated: trailing cdata after a preformatted block now carries the leading \n (rendered output is unchanged — paragraph whitespace is trimmed).
show more ...
|
| 95f69420 | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Lexer: dispatch zero-width EXIT events to mode handlers
invokeHandler short-circuited on empty content for every lexer state, which silently dropped EXIT events from zero-width exit patterns (lookah
Lexer: dispatch zero-width EXIT events to mode handlers
invokeHandler short-circuited on empty content for every lexer state, which silently dropped EXIT events from zero-width exit patterns (lookahead-only). The mode stack would still pop, but the mode's handle() method never ran, so handler-side cleanup (restoring buffered call writers, emitting close calls) was skipped. Now empty content is only suppressed for non-EXIT states.
Also fix the parameter docblock: $is_match was annotated as boolean but is actually one of the integer DOKU_LEXER_* constants. Renamed to $state to match.
show more ...
|
| eb15e634 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same decode post-extraction (the inline lexer never reaches their bodies). Mirrors the Helpers\Escape pattern.
Wired up in two slots:
- GfmCode info string: föö now decodes to föö in the language class. Clears spec example #330.
- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no longer terminates the URL early; the existing post-classify Escape::unescapeBackslashes call strips the backslashes after Link::classify has done its work. Clears #504, #506, #508.
Skip #328 with a self-contained title-slot reason: the URL side now decodes correctly, but the title attribute is still discarded (DokuWiki link instructions have no title slot).
show more ...
|
| d2085866 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity
Numeric refs are still decoded explicitly: PHP's html_entity_decode returns the input unchanged for U+0000, surrogates, U+10F
extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity
Numeric refs are still decoded explicitly: PHP's html_entity_decode returns the input unchanged for U+0000, surrogates, U+10FFFF, and BMP noncharacters where CommonMark requires U+FFFD or the literal codepoint. Named refs delegate to html_entity_decode with ENT_HTML5, which carries the full HTML5 named-entity table (including multi- codepoint decodes like ≧̸ -> U+2267 + U+0338).
Unknown names stay literal: the original &xxx; passes through as cdata and the renderer's &-escaping turns it into &xxx;.
show more ...
|
| 150dc5f2 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmNumericEntity for CommonMark numeric character references
Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex, 1-6 digits) to the corresponding Unicode codepoint, emitted as plain
add GfmNumericEntity for CommonMark numeric character references
Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex, 1-6 digits) to the corresponding Unicode codepoint, emitted as plain cdata. Codepoint 0, codepoints above U+10FFFF, and the surrogate range U+D800..U+DFFF map to U+FFFD per the spec.
Distinct from the typography Entity mode, which is renderer-side configurable via entities.conf. Numeric refs are not configurable so decoding happens at parse time and the renderer needs no changes.
Lexer leftmost-match consumes the run before any structural pattern, so *foo* renders as literal *foo* and * foo does not start a list - matching the spec rule that numeric refs cannot stand in for structural markers.
show more ...
|
| 13a62f81 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
rename syntax flavors 'dokuwiki' / 'markdown' to 'dw' / 'md'
Symmetry with the existing 'dw+md' / 'md+dw' setting values. |
| c4bcbc2e | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITI
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITION mode at sort 140, loaded under any MD-active syntax (markdown, dw+md, md+dw); pure dokuwiki is unaffected.
Reuses the existing `linebreak` handler call and renderer; no new instructions or renderer changes. SpecCompatRenderer overrides linebreak() to emit the spec's `<br />` shape. Examples 662, 663 (line break inside a raw HTML tag) are skipped — raw HTML is not passed through by default.
show more ...
|
| 3e6baeff | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax se
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax settings, mirroring the GfmQuote replacement pattern. Same `hr` handler call so renderers and the call API are unchanged.
Drops DW's old [ \t]* leading-whitespace tolerance — inert in practice past 0-1 spaces (Preformatted at sort 20 intercepts everything ≥ 2 spaces or any tab).
Spec examples 13, 20, 26-28, 224 turn green; 17, 21-24, 29, 30, 31 go to skip.php as deliberate non-implementations (whitespace tolerance and list-precedence cases).
show more ...
|