| 4b31eadf | 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
fix (parsing): avoid newline loss on GFM section editing
The GFM header parsing returned a byte position pointing at the newline before the actual header resulting in the observed newline eatings as
fix (parsing): avoid newline loss on GFM section editing
The GFM header parsing returned a byte position pointing at the newline before the actual header resulting in the observed newline eatings as reported in https://github.com/dokuwiki/dokuwiki/pull/4636#issuecomment-4491970909
Additionally this fixes an oddity of DW header parsing which accidentally allowed text on the line before the opening = chars. Whitespace is still allowed.
show more ...
|
| 47a02a10 | 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bun
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bundled DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry was a singleton that read $conf['syntax'] and the $PARSER_MODES global, and every mode reached it through ModeRegistry::getInstance() — so the flavour lived in shared mutable state that two parses in one request would fight over.
Make the registry a short-lived value instead:
- ModeRegistry is constructed once per parse with an explicit $syntax and injected into Parser, Handler and every mode. getSyntax() / isDwPreferred() / isMdPreferred() consult $this->syntax; the DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is fresh, nothing to invalidate). - p_get_instructions() is now the single place in the pipeline where $conf['syntax'] is read; from there the flavour travels as a parameter. No code under inc/Parsing/ reads $conf['syntax'] directly anymore — the five syntax-reading modes (Preformatted, GfmHr, GfmEscape, Externallink, GfmQuote) route through $this->registry.
Keep the two concepts apart, as documented in the ModeRegistry and AbstractMode docblocks: the user's configured *preference* stays in $conf['syntax'] for UI code (toolbar, settings), while the active parse's syntax is a parameter carried by the registry.
$PARSER_MODES is demoted to a deprecated, read-only mirror, published during loadPluginModes() — third-party syntax plugins (columnlist, alphalist2, phpwikify, skipentity) and the bundled info plugin read the global directly, often from their constructors, so the taxonomy must stay visible there. No core code reads the mirror.
Fold ModeInterface into AbstractMode while here: getSort()/handle() are abstract, the connect callbacks carry defaults, and the public $Lexer "FIXME should be done by setter" becomes setLexer()/getLexer() injected by Parser::addMode() alongside the registry. Nested-content resolution moves to the allowedCategories()/filterAllowedModes() hooks, resolved once when the registry is attached.
Tests build their own parser/registry through ParserTestBase::setSyntax() instead of mutating $conf and calling the removed ModeRegistry::reset().
show more ...
|
| 4f32c45b | 26-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmLink: allow soft line break inside link text
The label character class explicitly forbade `\n`, so a CommonMark soft line break inside link text (e.g. `[link with<EOL>more](url)`) fell through to
GfmLink: allow soft line break inside link text
The label character class explicitly forbade `\n`, so a CommonMark soft line break inside link text (e.g. `[link with<EOL>more](url)`) fell through to literal text instead of producing a link. Loosen the class to accept a bare `\n` as long as it is not followed by a blank line — soft breaks are spec-allowed inside link text, blank lines are not, and refusing them also keeps `\n#`-anchored block modes (header, hr, ...) from being swallowed by a runaway link match.
The `\n` survives into the label string and renders as a literal line ending in HTML, which browsers display as a single space. This soft break behavior has been checked against https://spec.commonmark.org/dingus/
Note that this behavior differs from github where the line break is rendered as a hard break <br>.
show more ...
|
| 65dd2042 | 26-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes
Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the two backslashes of `\\` followed by space/tab/newline. The lexer's ti
GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes
Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the two backslashes of `\\` followed by space/tab/newline. The lexer's tie-breaker picked GfmEscape, so DW's forced linebreak silently lost its delimiter under dw+md and md+dw. Add a negative lookahead that declines `\\[ \t\n]` whenever DW syntax is loaded — pure md keeps GFM-spec behavior. Mid-line `\\` (UNC paths etc.) still escapes.
show more ...
|
| e7dae73b | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
fix: apply rector and code sniffer fixes |
| d331a839 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename
Constant was renamed on master (the typo'd 'substition' value is kept, but the constant name spells it correctly). Update GfmTabl
GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename
Constant was renamed on master (the typo'd 'substition' value is kept, but the constant name spells it correctly). Update GfmTable's use of the constant, plus stale docblock/comment references in GfmEscape, GfmHtmlEntity, GfmLinebreak, and GfmLinebreakTest.
show more ...
|
| 15429f02 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: GFM autolink extension - parens and entity-ref tail
In Markdown-preferred mode, allow `(` and `)` inside URL char classes and consume an optional trailing entity reference via the shar
Externallink: GFM autolink extension - parens and entity-ref tail
In Markdown-preferred mode, allow `(` and `)` inside URL char classes and consume an optional trailing entity reference via the shared HtmlEntity::PATTERN. The Markdown-only post-processing peels off mismatched closing parens and decodes the trailing entity reference, emitting the peeled chars as cdata after the link. Refactors handle() to dispatch to handleAngleAutolink() and handleBareUrl(), with the new trim logic in peelGfmTail() and the protocol-prefix step in addProtocolPrefix(). DW-only mode behavior is unchanged.
Brings GFM spec examples #624, #625, #626 to passing.
show more ...
|
| 73dc0a89 | 06-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
fix(mail): keep '&' intact in mailto links with multiple query params
Move the email-handling helpers (obfuscate, mail_isvalid, mail_quotedprintable_encode, mail_setup) out of the procedural inc/mai
fix(mail): keep '&' intact in mailto links with multiple query params
Move the email-handling helpers (obfuscate, mail_isvalid, mail_quotedprintable_encode, mail_setup) out of the procedural inc/mail.php into a namespaced dokuwiki\MailUtils class plus a new Mailer::configInit(), and add a separate MailUtils::obfuscateUrl() for the mailto-href context.
The xhtml renderer and PluginTrait now build the link label and the href separately: the address half is run through the mailguard obfuscation, the query string is preserved verbatim with only HTML escaping applied. This fixes #1690 — in 'visible' mode the previous code rawurlencoded the entire address+query, turning '?' into '%3F' and breaking multi-parameter mailto links; in all modes the query string is no longer mangled by the [at]/[dot] substitution.
Core call sites (Mailer, auth, LegacyApiCore, common, the xhtml renderer, the parser, the bundled config/styling/usermanager plugins) are migrated to MailUtils directly. The old top-level functions and PREG_PATTERN_VALID_EMAIL constant remain as deprecated shims with rector mappings.
Tests for obfuscate / mail_isvalid / mail_quotedprintable_encode are consolidated into a single _test/tests/MailUtilsTest.php and extended with regression coverage for the multi-parameter, double-escape and URL-shape cases.
Closes #1690 Replaces #1964
show more ...
|
| 56c730b5 | 06-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
keep historic typo in value but not in constant
We need to keep the historic typo in the value ("substition"), but there is no reason to keep it in the constant. |
| 0f694376 | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmLink: accept escaped brackets inside link labels
The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and left labels with escaped brackets unmatched. Promote it to `(?:\\.|[^\[\]\n])+` —
GfmLink: accept escaped brackets inside link labels
The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and left labels with escaped brackets unmatched. Promote it to `(?:\\.|[^\[\]\n])+` — the same backslash-escape trick the URL slot already uses — so spec example 523 (`[link \[bar](/uri)`) matches and unescapes cleanly. The image-as-label sub-pattern gets the same upgrade.
handle() needs no change: the new class still rejects bare `]`, so the first literal `](` in the match is still the separator; Escape::unescapeBackslashes() was already collapsing `\[` to `[` before the label reached the link handler.
Adds two GfmLinkTest cases for the `\[` / `\]` forms.
show more ...
|
| dccbd514 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmQuote: accept ^> line starts so quotes can follow tables and lists
GfmTable, DW Table, and DW Listblock all consume the boundary \n on their way out. A pure-lookahead exit pattern at that boundar
GfmQuote: accept ^> line starts so quotes can follow tables and lists
GfmTable, DW Table, and DW Listblock all consume the boundary \n on their way out. A pure-lookahead exit pattern at that boundary would trip the lexer's no-advance safety check, because tables and lists exit right after consuming a marker token and have no leading unmatched content for the lookahead to attach to (unlike Preformatted, whose body leaves code lines as UNMATCHED right before the boundary).
Fix this on the consumer side: change the first-line anchor from \n> to (?:^|\n)>. With the lexer's m flag, ^ matches at offset 0 and at any position immediately following a \n in the subject, including the position right after a \n that a preceding mode just consumed. Subsequent quote lines keep the \n> anchor.
Adds three handoff tests in GfmQuoteTest covering GfmTable, DW Table, and DW Listblock. Resolves GFM spec example 201.
show more ...
|
| f9d3b7bd | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme patterns share the existing conf/scheme.conf allow-list so unknown schemes fall through to literal cdata instead of being silently dropped by the renderer. Internal whitespace inside the brackets disqualifies the autolink and the whole envelope is emitted as cdata to keep the bare-URL detector off the URL.
LinksTest gains 5 cases covering success, internal-whitespace and leading-whitespace disqualification, unregistered scheme fallthrough, and the dw-only no-op path. SpecCompatRenderer URL encoder is updated to match cmark-gfm's HREF_SAFE table (square brackets and a few other characters move from safe to encoded). skip.php loses the obsolete #356 entry and gains #605/#606/#607/#609 explaining the unregistered- scheme cases that the per-scheme regex naturally rejects.
show more ...
|
| f57da51c | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Preformatted: leave boundary \n in stream when next line has content
Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing consuming \n exit. When an indented code block is followed
Preformatted: leave boundary \n in stream when next line has content
Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing consuming \n exit. When an indented code block is followed by a non- blank line, the boundary newline now stays available for downstream block-level matchers (GfmHr, GfmHeader, etc.) instead of being eaten on the way out of preformatted mode.
Concretely fixes a thematic-break-after-indented-code case (GFM spec case 85's trailing ----): without this change, GfmHr's \n anchor failed because preformatted had already consumed the newline, and the bare ---- fell through to Entity which converted --- to an em-dash.
The consuming branch is kept as a fall-through for the blank-line and end-of-input cases, where a pure lookahead would trip the lexer's no-advance safety check.
Six PreformattedTest expectations updated: trailing cdata after a preformatted block now carries the leading \n (rendered output is unchanged — paragraph whitespace is trimmed).
show more ...
|
| eb15e634 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same decode post-extraction (the inline lexer never reaches their bodies). Mirrors the Helpers\Escape pattern.
Wired up in two slots:
- GfmCode info string: föö now decodes to föö in the language class. Clears spec example #330.
- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no longer terminates the URL early; the existing post-classify Escape::unescapeBackslashes call strips the backslashes after Link::classify has done its work. Clears #504, #506, #508.
Skip #328 with a self-contained title-slot reason: the URL side now decodes correctly, but the title attribute is still discarded (DokuWiki link instructions have no title slot).
show more ...
|
| d2085866 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity
Numeric refs are still decoded explicitly: PHP's html_entity_decode returns the input unchanged for U+0000, surrogates, U+10F
extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity
Numeric refs are still decoded explicitly: PHP's html_entity_decode returns the input unchanged for U+0000, surrogates, U+10FFFF, and BMP noncharacters where CommonMark requires U+FFFD or the literal codepoint. Named refs delegate to html_entity_decode with ENT_HTML5, which carries the full HTML5 named-entity table (including multi- codepoint decodes like ≧̸ -> U+2267 + U+0338).
Unknown names stay literal: the original &xxx; passes through as cdata and the renderer's &-escaping turns it into &xxx;.
show more ...
|
| 150dc5f2 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmNumericEntity for CommonMark numeric character references
Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex, 1-6 digits) to the corresponding Unicode codepoint, emitted as plain
add GfmNumericEntity for CommonMark numeric character references
Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex, 1-6 digits) to the corresponding Unicode codepoint, emitted as plain cdata. Codepoint 0, codepoints above U+10FFFF, and the surrogate range U+D800..U+DFFF map to U+FFFD per the spec.
Distinct from the typography Entity mode, which is renderer-side configurable via entities.conf. Numeric refs are not configurable so decoding happens at parse time and the renderer needs no changes.
Lexer leftmost-match consumes the run before any structural pattern, so *foo* renders as literal *foo* and * foo does not start a list - matching the spec rule that numeric refs cannot stand in for structural markers.
show more ...
|
| 13a62f81 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
rename syntax flavors 'dokuwiki' / 'markdown' to 'dw' / 'md'
Symmetry with the existing 'dw+md' / 'md+dw' setting values. |
| c4bcbc2e | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITI
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITION mode at sort 140, loaded under any MD-active syntax (markdown, dw+md, md+dw); pure dokuwiki is unaffected.
Reuses the existing `linebreak` handler call and renderer; no new instructions or renderer changes. SpecCompatRenderer overrides linebreak() to emit the spec's `<br />` shape. Examples 662, 663 (line break inside a raw HTML tag) are skipped — raw HTML is not passed through by default.
show more ...
|
| 3e6baeff | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax se
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax settings, mirroring the GfmQuote replacement pattern. Same `hr` handler call so renderers and the call API are unchanged.
Drops DW's old [ \t]* leading-whitespace tolerance — inert in practice past 0-1 spaces (Preformatted at sort 20 intercepts everything ≥ 2 spaces or any tab).
Spec examples 13, 20, 26-28, 224 turn green; 17, 21-24, 29, 30, 31 go to skip.php as deliberate non-implementations (whitespace tolerance and list-precedence cases).
show more ...
|
| 309a0852 | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-p
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-pass flattens sub-parsed paragraph wrapping into linebreak calls so existing pages keep their <br/>-between-lines rendering. MD-preferred keeps the <p>-wrapped spec shape.
Block content (lists, fenced code, tables) inside `>` quotes now renders, since the body is sub-parsed. Headers stay excluded (BASEONLY) — TOC and section-edit anchors don't compose with <blockquote>, same rationale as GfmListblock.
Convert ModeRegistry's sub-parser cache into an acquire/release pool to support same-key re-entrancy: a list inside a quote re-enters gfm_quote during the list-item sub-parse, and the inner call needs its own parser instance even though the exclusion key matches. GfmListblock is updated to use the new acquire/release primitives.
show more ...
|
| 74031e46 | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation c
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation char before competing delimiters can match. The shared character class lives on Helpers\Escape so the lexer pattern and the post-hoc unescape stay in lockstep.
Whole-span captures (GfmCode info string, GfmLink label/URL) bypass the lexer; those modes call Escape::unescapeBackslashes() on the relevant slot. GfmLink skips the unescape when the URL classifies as a windowssharelink so the leading \\host survives intact.
GfmTable cells get a separate per-cell `\|` to `|` pass in the rewriter to honour the tables-extension rule that pipes always unescape, even inside code spans where standard §6.1 escapes don't fire.
show more ...
|
| 3dabe4e0 | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmTable for GFM tables
Implements the GFM pipe-table extension as a CONTAINER mode at sort 55, one below DW Table at 60. A lookahead-validated entry pattern asserts a header line plus a `:?-+:?
add GfmTable for GFM tables
Implements the GFM pipe-table extension as a CONTAINER mode at sort 55, one below DW Table at 60. A lookahead-validated entry pattern asserts a header line plus a `:?-+:?` delimiter row before consuming any input, so non-table paragraphs containing pipes flow through unchanged. Cells are inline-only per spec.
Handler\GfmTable rewrites the flat token stream into the canonical table_open / tablethead_* / tabletbody_* / table_close sequence, deriving per-column alignment from the delimiter row, padding short body rows (spec 202), truncating long ones (spec 204), and falling back to a single cdata when the column count mismatches (spec 203).
`tabletbody_open` / `tabletbody_close` are emitted for the first time; they are part of the base renderer API but DW Table never used them. Added to Block's blockOpen / blockClose lists alongside `tabletfoot_*` for symmetry. SpecCompatRenderer gains minimal table-element overrides so spec roundtrip output matches GFM's `<table><thead><tr><th>` shape without DW's wrapper div, row/col counter classes, or align-as-class.
show more ...
|
| 685560eb | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's body is dedented to its content column and parsed by ModeRegistry::getSubParser() so block content (paragraphs, fenced code, blockquotes, plugin blocks) works inside items uniformly. Sub-parsed calls are wrapped in a Nest call before they reach the outer handler, matching the Footnote pattern: the main handler's Block rewriter treats nest as opaque and the renderer base class unwraps it transparently, so multi-paragraph items don't get double-wrapped in <p>.
Marker syntax: -, *, + (unordered) or 1-9 digits followed by . or ) (ordered). Indentation is a 2-space-multiple step starting at 0; depth = (indent / 2) + 1, odd indents round down, tabs become two spaces. The first ordered item's number drives the start attribute on <ol> via the listo_open $start parameter.
GfmLists subclasses AbstractListsRewriter with the GFM marker parser; the state machine on the base class is shared with DW Lists.
GfmListblock loads only when $conf['syntax'] is markdown or md+dw. Under those settings the DW Listblock is suppressed because the two list models conflict — DW's mandatory 2-space indent rule vs GFM's zero-indent top-level rule, and -/*/+ markers shared. Plugins that relied on Listblock loading under md+dw will see it absent there.
Sub-parser exclusion set: CATEGORY_BASEONLY (no Header inside list items) and gfm_listblock itself (defensive guard against re-entry on pathological inputs; nested lists are handled by the outer pattern, not by re-entry).
Tests cover marker variants, ordered start numbers, nested lists at two and three levels, inline formatting inside items, marker- character switches keeping one list, type switches splitting the list, fenced code inside items, multi-paragraph (loose) items, and two regressions on blank-line tolerance inside the captured block. SpecCompatRenderer learns to render the list call sequence, and spec.txt tests for digit/marker-width/lazy-continuation behavior that GfmListblock deliberately doesn't implement are documented in gfm-spec/skip.php with the per-bucket reasons (A-F).
Drops two now-obsolete entries from skip.php (image escapes that land via earlier GfmLink/GfmMedia work) and inlines the Setext explanation that previously pointed at SPEC.md. Replaces the SPEC.md reference in GfmEmphasisTest with the inline reason.
show more ...
|
| 96d096f1 | 27-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
remove getLineStartMarkers registry — sort order already wins
Preformatted's entry pattern carried a `(?![\*\-])` negative lookahead to defer to list modes on indented bullet lines. 0cecf9d50 (2005,
remove getLineStartMarkers registry — sort order already wins
Preformatted's entry pattern carried a `(?![\*\-])` negative lookahead to defer to list modes on indented bullet lines. 0cecf9d50 (2005, "new parser added") introduced it hardcoded; 7958e6980 (2026, "decouple hardcoded mode names in Eol and Preformatted") refactored that hardcoded knowledge into register/getLineStartMarkers on ModeRegistry so each list mode owned its marker chars. Both preserved the behavior verbatim; neither documented why it was needed.
Tracing the lexer, it isn't. ParallelRegex merges all entry patterns into one PCRE expression; PCRE returns the leftmost match and breaks ties on expression order. Modes are added in sort order via ModeRegistry::getModes(), so Listblock (sort 10) always precedes Preformatted (sort 20) and wins the tie on " - foo" without any lookahead. The only test that caught a difference was testPreformattedList, which happened to register modes in non-canonical order - that was a test bug.
This patch drops the lookahead in Preformatted::connectTo, the registerLineStartMarkers call in Listblock::preConnect, the register/getLineStartMarkers methods on ModeRegistry, and the three registry-API unit tests. testPreformattedList now registers Listblock before Preformatted.
show more ...
|
| 1e28e406 | 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
split Parsing\Helpers into per-domain Link / Media / Code classes |