| 309a0852 | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-p
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-pass flattens sub-parsed paragraph wrapping into linebreak calls so existing pages keep their <br/>-between-lines rendering. MD-preferred keeps the <p>-wrapped spec shape.
Block content (lists, fenced code, tables) inside `>` quotes now renders, since the body is sub-parsed. Headers stay excluded (BASEONLY) — TOC and section-edit anchors don't compose with <blockquote>, same rationale as GfmListblock.
Convert ModeRegistry's sub-parser cache into an acquire/release pool to support same-key re-entrancy: a list inside a quote re-enters gfm_quote during the list-item sub-parse, and the inner call needs its own parser instance even though the exclusion key matches. GfmListblock is updated to use the new acquire/release primitives.
show more ...
|
| f7c6e4ac | 30-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add listo_open_start sibling method for GFM start numbers
Reverts the listo_open signature widening from 5a2118acc and instead adds a sibling method `listo_open_start($start = 1)` on the renderer hi
add listo_open_start sibling method for GFM start numbers
Reverts the listo_open signature widening from 5a2118acc and instead adds a sibling method `listo_open_start($start = 1)` on the renderer hierarchy. The base default delegates to listo_open() so renderers that don't override it still produce a valid (but unnumbered) list; xhtml's override emits <ol start="N">.
The handler now emits 'listo_open_start' only for ordered lists with a non-default first number; plain ordered lists keep emitting the unchanged 'listo_open' instruction. This preserves the historical listo_open / listu_open signatures (zero-arg base, $classes-only xhtml form from 2016) so the 17 plugin renderers found via codesearch keep working without modification, while still implementing GFM's "5. foo" -> <ol start="5"> rule.
show more ...
|
| 74031e46 | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation c
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation char before competing delimiters can match. The shared character class lives on Helpers\Escape so the lexer pattern and the post-hoc unescape stay in lockstep.
Whole-span captures (GfmCode info string, GfmLink label/URL) bypass the lexer; those modes call Escape::unescapeBackslashes() on the relevant slot. GfmLink skips the unescape when the URL classifies as a windowssharelink so the leading \\host survives intact.
GfmTable cells get a separate per-cell `\|` to `|` pass in the rewriter to honour the tables-extension rule that pipes always unescape, even inside code spans where standard §6.1 escapes don't fire.
show more ...
|
| 3dabe4e0 | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmTable for GFM tables
Implements the GFM pipe-table extension as a CONTAINER mode at sort 55, one below DW Table at 60. A lookahead-validated entry pattern asserts a header line plus a `:?-+:?
add GfmTable for GFM tables
Implements the GFM pipe-table extension as a CONTAINER mode at sort 55, one below DW Table at 60. A lookahead-validated entry pattern asserts a header line plus a `:?-+:?` delimiter row before consuming any input, so non-table paragraphs containing pipes flow through unchanged. Cells are inline-only per spec.
Handler\GfmTable rewrites the flat token stream into the canonical table_open / tablethead_* / tabletbody_* / table_close sequence, deriving per-column alignment from the delimiter row, padding short body rows (spec 202), truncating long ones (spec 204), and falling back to a single cdata when the column count mismatches (spec 203).
`tabletbody_open` / `tabletbody_close` are emitted for the first time; they are part of the base renderer API but DW Table never used them. Added to Block's blockOpen / blockClose lists alongside `tabletfoot_*` for symmetry. SpecCompatRenderer gains minimal table-element overrides so spec roundtrip output matches GFM's `<table><thead><tr><th>` shape without DW's wrapper div, row/col counter classes, or align-as-class.
show more ...
|
| 685560eb | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's body is dedented to its content column and parsed by ModeRegistry::getSubParser() so block content (paragraphs, fenced code, blockquotes, plugin blocks) works inside items uniformly. Sub-parsed calls are wrapped in a Nest call before they reach the outer handler, matching the Footnote pattern: the main handler's Block rewriter treats nest as opaque and the renderer base class unwraps it transparently, so multi-paragraph items don't get double-wrapped in <p>.
Marker syntax: -, *, + (unordered) or 1-9 digits followed by . or ) (ordered). Indentation is a 2-space-multiple step starting at 0; depth = (indent / 2) + 1, odd indents round down, tabs become two spaces. The first ordered item's number drives the start attribute on <ol> via the listo_open $start parameter.
GfmLists subclasses AbstractListsRewriter with the GFM marker parser; the state machine on the base class is shared with DW Lists.
GfmListblock loads only when $conf['syntax'] is markdown or md+dw. Under those settings the DW Listblock is suppressed because the two list models conflict — DW's mandatory 2-space indent rule vs GFM's zero-indent top-level rule, and -/*/+ markers shared. Plugins that relied on Listblock loading under md+dw will see it absent there.
Sub-parser exclusion set: CATEGORY_BASEONLY (no Header inside list items) and gfm_listblock itself (defensive guard against re-entry on pathological inputs; nested lists are handled by the outer pattern, not by re-entry).
Tests cover marker variants, ordered start numbers, nested lists at two and three levels, inline formatting inside items, marker- character switches keeping one list, type switches splitting the list, fenced code inside items, multi-paragraph (loose) items, and two regressions on blank-line tolerance inside the captured block. SpecCompatRenderer learns to render the list call sequence, and spec.txt tests for digit/marker-width/lazy-continuation behavior that GfmListblock deliberately doesn't implement are documented in gfm-spec/skip.php with the per-bucket reasons (A-F).
Drops two now-obsolete entries from skip.php (image escapes that land via earlier GfmLink/GfmMedia work) and inlines the Setext explanation that previously pointed at SPEC.md. Replaces the SPEC.md reference in GfmEmphasisTest with the inline reason.
show more ...
|
| 9172eccf | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add sub-parser support to Handler / Parser / ModeRegistry
A block mode that wants to parse the body of one of its captured matches needs a second Parser instance configured with the active modes min
add sub-parser support to Handler / Parser / ModeRegistry
A block mode that wants to parse the body of one of its captured matches needs a second Parser instance configured with the active modes minus whatever would re-enter the outer mode. Doing this by hand is verbose and easy to get wrong — modes hold a $Lexer slot that addMode() overwrites, so the same mode object can't be shared between the main parser and a sub-parser.
Three small additions:
Handler::reset() — clears calls, status, currentModeName, and installs a fresh CallWriter. Lets one Handler instance be parsed against repeatedly without state bleed.
Parser::getHandler() — accessor; sub-parser callers need it to reach the handler for reset() and for harvesting the produced call list.
ModeRegistry::getSubParser($excludeCategories, $excludeModes) — returns a cached Parser preconfigured with every active mode except those excluded. Mode objects are cloned before being attached so connectTo()'s assignment to $Lexer does not clobber the main parser's references. Cache key is the exclusion-set; default exclusion is CATEGORY_BASEONLY (no Header inside the sub-parsed content).
Tests cover Handler::reset's full clear, sub-parser caching, default and custom exclusions, registry-reset propagation, and the clone-not-share invariant for $Lexer.
show more ...
|
| bf6e4f0d | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract AbstractListsRewriter from Lists
The list-block CallWriter rewriter mixed two concerns: a shared state machine that turns flat list_open / list_item / list_close calls into the nested listu_
extract AbstractListsRewriter from Lists
The list-block CallWriter rewriter mixed two concerns: a shared state machine that turns flat list_open / list_item / list_close calls into the nested listu_open / listo_open / listitem / listcontent shape the renderers expect, and a syntax-specific marker parser that maps the captured indent + marker text to depth/type.
Hoist the state machine onto a new abstract base class AbstractListsRewriter; Lists keeps only its DokuWiki marker parser (`*` unordered, `-` ordered, 2-space-per-level indent). The upcoming GfmLists will share the same base class with its own GFM marker parser.
The interpretSyntax contract changes shape:
protected function interpretSyntax($match, &$type): int to abstract protected function interpretSyntax(string $match): array; // returns ['depth' => int, 'type' => 'u'|'o', 'start'? => int]
The optional `start` key carries the first ordered item's number for syntaxes that support it (GFM); DokuWiki omits it and gets the default of 1. Plugins subclassing Lists that override interpretSyntax need to update — known affected: creole, markdowku, mediasyntax (per codesearch.dokuwiki.org). The migration is mechanical: replace the by-ref $type assignment and int return with an associative-array return.
The protected listStart / listOpen / listEnd dispatch methods on the old Lists are gone (renamed handleListOpen / handleListItem / handleListClose on the base class), but no plugin in the ecosystem overrides those, only interpretSyntax.
show more ...
|
| 96d096f1 | 27-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
remove getLineStartMarkers registry — sort order already wins
Preformatted's entry pattern carried a `(?![\*\-])` negative lookahead to defer to list modes on indented bullet lines. 0cecf9d50 (2005,
remove getLineStartMarkers registry — sort order already wins
Preformatted's entry pattern carried a `(?![\*\-])` negative lookahead to defer to list modes on indented bullet lines. 0cecf9d50 (2005, "new parser added") introduced it hardcoded; 7958e6980 (2026, "decouple hardcoded mode names in Eol and Preformatted") refactored that hardcoded knowledge into register/getLineStartMarkers on ModeRegistry so each list mode owned its marker chars. Both preserved the behavior verbatim; neither documented why it was needed.
Tracing the lexer, it isn't. ParallelRegex merges all entry patterns into one PCRE expression; PCRE returns the leftmost match and breaks ties on expression order. Modes are added in sort order via ModeRegistry::getModes(), so Listblock (sort 10) always precedes Preformatted (sort 20) and wins the tie on " - foo" without any lookahead. The only test that caught a difference was testPreformattedList, which happened to register modes in non-canonical order - that was a test bug.
This patch drops the lookahead in Preformatted::connectTo, the registerLineStartMarkers call in Listblock::preConnect, the register/getLineStartMarkers methods on ModeRegistry, and the three registry-API unit tests. testPreformattedList now registers Listblock before Preformatted.
show more ...
|
| 1e28e406 | 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
split Parsing\Helpers into per-domain Link / Media / Code classes |
| 781f5c71 | 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
gate monospace, unformatted, file on DokuWiki syntax
These DokuWiki specific modes should only be loaded when DokuWiki syntax is still wanted, not in Markdown-only mode. Expands the ModeRegistryTest
gate monospace, unformatted, file on DokuWiki syntax
These DokuWiki specific modes should only be loaded when DokuWiki syntax is still wanted, not in Markdown-only mode. Expands the ModeRegistryTest data provider to cover the full always-loaded and DW-always sets.
show more ...
|
| b1c59bed | 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and close
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and closer, and unclosed fences stay literal — matching DokuWiki's `<code>` tag convention. The info string accepts DW's full attribute vocabulary (language, filename, [options]) through a new shared `Helpers::parseCodeAttributes` that `Code` also uses, with `html` aliased to `html4strict` and `-` meaning "no language".
Preformatted's indent threshold is now preference-gated: 2 spaces in DW-preferred settings, 4 spaces in MD-preferred, matching GFM's indented code block rule. A single tab is a trigger in both.
show more ...
|
| 3440a8c0 | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmMedia and extend GfmLink with image-as-label form
- New GfmMedia parses `` with the full DokuWiki media-parameter vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache,
add GfmMedia and extend GfmLink with image-as-label form
- New GfmMedia parses `` with the full DokuWiki media-parameter vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache, …). Adds `?left`/`?right`/`?center` align keywords shared with DW `{{…}}` — gives pure-Markdown users a way to align inline images. - GfmLink now also matches `[](target)` — the GFM equivalent of `[[target|{{img}}]]`. Detection is post-entry, mirroring Internallink's `^{{…}}$` check; one mode covers the whole family. - LinkDispatch trait replaced by Helpers::classifyLink and Helpers::parseMediaParameters — two pure static methods, shared by DW and GFM counterparts. - Entry patterns for GfmLink / GfmMedia simplified (permissive URL slot, handle-time parsing), following DW's Internallink style. - GfmSpecTest drives a test-only SpecCompatRenderer that emits bare <img> / <a> instead of DW's wiki-wrapped HTML, recovering 13 spec tests that previously failed/skipped only because of renderer shape.
show more ...
|
| e89aeebd | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmLink for GFM inline links `[text](url)`
Extracts the URL-classification ladder from Internallink into a LinkDispatch trait so both modes route identically across all six DokuWiki link flavors
add GfmLink for GFM inline links `[text](url)`
Extracts the URL-classification ladder from Internallink into a LinkDispatch trait so both modes route identically across all six DokuWiki link flavors (internal, external, interwiki, email, windowsshare, local anchor). GfmLink parses the `[text](url)` form with optional `"title"` / `'title'` and hands the URL to the trait. The GFM title attribute is discarded — DokuWiki link instructions have no slot for it.
show more ...
|
| 8719732d | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmHeader for ATX headings (`# text` through `###### text`)
Opener must sit at column 0. GFM tolerates 0-3 spaces before the `#` but that collides with DokuWiki's 2-space-indent preformatted blo
add GfmHeader for ATX headings (`# text` through `###### text`)
Opener must sit at column 0. GFM tolerates 0-3 spaces before the `#` but that collides with DokuWiki's 2-space-indent preformatted block, so the tolerance is dropped rather than plumbed across modes.
Widen the XHTML renderer's section-node tracker from 5 slots to 6 so h6 doesn't hit "Undefined array key 5". Extend GfmSpecTest's HTML normalizer to strip DokuWiki's section-div wrappers, section-edit comments, and header id/class attributes so heading spec examples can validate semantic correctness.
show more ...
|
| 8ed75a23 | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text<
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text</code> GfmBacktickDouble ``text`` → <code>text</code>
Both emit monospace_open and monospace_close around an unformatted() call (the same instruction shape as DokuWiki's two-single-quote pair wrapping a nowiki span), so renderers that distinguish verbatim text from plain cdata — metadata, indexer, non-XHTML backends — treat the body as literal.
GfmBacktickDouble extends GfmBacktickSingle to reuse handle() and the body-normalization helper; only the delimiter length and the body character class differ. Both share sort 165 and gate on Markdown being loaded.
Design notes:
* The lexer has no backreferences, so each length is its own mode. Length-boundary guards (?<!`)...(?!`) on every opener and closer ensure a run of two-or-more backticks is never read as an n=1 delimiter and a run of three-or-more is never read as n=2. The two modes never steal each other's input regardless of registration order — sort can't reach this kind of cross-position constraint.
* Edge-whitespace handling and newline normalization live in handle(), not in the regex. On DOKU_LEXER_UNMATCHED the body is normalized: 1. CR/LF and LF become single spaces (GFM line-ending rule). 2. If the body starts and ends with a space and is not entirely whitespace, one space is stripped from each end. That produces the right GFM output for the tricky cases without special-casing the entry pattern: ` ` → <code> </code> (all-whitespace, no strip) ` a` → <code> a</code> (asymmetric, no strip) ` `` ` → <code>``</code> (interior run-of-2 + strip) ``foo`bar`` → <code>foo`bar</code>
* Body character classes admit exactly the runs that cannot be valid closers for this mode's length: n=1 allows `[^`] | ``+`, n=2 allows `[^`] | `(?!`)`. That is what lets a single-backtick span contain a pair and a double-backtick span contain a lone backtick.
* allowedModes is empty — no other inline parsing runs inside a span.
Deliberately not implemented, with skip.php entries explaining why:
351 — code-span precedence over emphasis (*foo`*` expected to render as *foo<code>*</code>). Cross-positional: the single-pass lexer matches leftmost-first and cannot reject an earlier emphasis opener because a later backtick span would consume its closer. A proper fix would need a pre-scan pass; sort values only break ties at the same position. 353 — the trailing " outside the code span gets converted to a curly quote by DokuWiki typography, diverging from spec HTML. 354 — raw HTML tag pass-through; DokuWiki does not render raw HTML by default. 356 — GFM angle-bracket autolink <http://…>: not implemented.
Per-mode unit tests cover basic matching, flanking via the length- boundary guards, interior-run support in the body, edge-space stripping, newline normalization, all-whitespace bodies, paragraph- boundary rejection, content-is-literal, and sort values. ModeRegistryTest's gating data provider picks up both modes.
Net effect on GfmSpecTest: eleven previously-red code-span examples now pass (339, 340, 341, 342, 344, 345, 346, 347, 349, 350, 357, 359 — the simple pairs, edge-space, interior-run, newline-normalization, and mismatched-run cases). Four skipped. Three remain pending outside the code-span scope (emphasis interactions that need GfmLink once that lands).
show more ...
|
| 864d6c6d | 21-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
fix Lexer so exit-pattern lookbehinds see chars consumed by prior tokens
Lexer::reduce used to hand PCRE a shrinking tail of the subject — each matched token was chopped off the front of $raw and th
fix Lexer so exit-pattern lookbehinds see chars consumed by prior tokens
Lexer::reduce used to hand PCRE a shrinking tail of the subject — each matched token was chopped off the front of $raw and the next preg_match ran on what remained. Once a token was consumed, the bytes before the cursor were gone, and any lookbehind assertion in a subsequent pattern silently failed.
The bug was latent for DokuWiki's entire history because literal exit patterns like `\*\*`, `</file>`, or `%%` don't care what's behind them. It surfaced with c3755410a ("require non-whitespace adjacency for inline formatting delimiters"), which added `(?<=[^\s])` to Strong, Emphasis, Underline, Monospace, Subscript, Superscript and Deleted at once. After that commit, `**[[link]]**` stopped closing — the `]` that would satisfy the lookbehind had just been consumed by the link match, so Strong stayed open until end-of-section and swallowed everything after it (list items, headings, the lot).
Fix:
* Lexer::parse and Lexer::reduce track a byte offset into $raw instead of mutating $raw. $initialLength and the shrinking-length arithmetic for absolute match positions are replaced by straight offset increments; the no-progress guard and the trailing-unmatched dispatch both shift to the same cursor.
* ParallelRegex::split takes an optional $offset and passes it to preg_match together with PREG_OFFSET_CAPTURE. PCRE scans from the offset forward but still sees the whole subject, so lookbehinds work across already-consumed tokens. The secondary preg_split call used to carve out pre/post is no longer needed — PREG_OFFSET_CAPTURE gives the match start for free, saving one regex operation per reduce() step.
Regression tests at all three layers:
* ParallelRegexTest — offset plumbing and pre/match accounting. * LexerTest::testIndexLookbehindAcrossConsumedToken — exit-pattern lookbehind targeting the `/>` of a self-closing `<a/>` that was consumed as a SPECIAL token on the previous step. Fails under the old Lexer. * FormattingTest — `**[[link]]**` and `**foo//bar//**` round-trip with correct open/close instructions through the full pipeline.
Also updates ListsTest::testUnorderedListStrong, whose expectations documented the pre-fix buggy behaviour ("formatting able to spread across list items"). With the fix, bold correctly stays within a single list item; the expected call sequence and the comment are updated to match.
show more ...
|
| 0244be5c | 21-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmDeleted mode for GFM strikethrough (`~~text~~`)
Shares the deleted_open/deleted_close instructions with DW's <del> mode. Entry/exit anchors `(?<!~)` / `(?!~)` reject runs of three or more til
add GfmDeleted mode for GFM strikethrough (`~~text~~`)
Shares the deleted_open/deleted_close instructions with DW's <del> mode. Entry/exit anchors `(?<!~)` / `(?!~)` reject runs of three or more tildes so fenced-code markers remain untouched. Also trim redundant class-level docblocks on sibling Gfm test files.
show more ...
|
| 2bb62bca | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GFM em-wrapping-strong modes for `***foo***` / `___foo___`
Two new inline formatting modes that render triple-delimiter runs as em wrapping strong:
GfmEmphasisStrong `***text***`
add GFM em-wrapping-strong modes for `***foo***` / `___foo___`
Two new inline formatting modes that render triple-delimiter runs as em wrapping strong:
GfmEmphasisStrong `***text***` → <em><strong>text</strong></em> GfmEmphasisStrongUnderscore `___text___` → same (MD-preferred only)
Only the exact 3+3 symmetric case is handled. The other long-run and asymmetric variants (4+4, 5+5, `***foo**`, etc.) require CommonMark's stack-based delimiter-pairing algorithm with its flanking and multiple-of-3 rules, which is explicitly out of scope; those examples stay skipped in gfm-spec/skip.php.
Implementation notes:
* Patterns enforce exact 3+3 via `(?<!\*)` / `(?<!_)` lookbehinds (preventing entry at the second `*` of a `****...` run) and `(?!\*)` / `(?!_)` lookaheads after the closing triple (rejecting `***foo****` etc.). Combined with the existing non-whitespace adjacency lookaheads, all asymmetric cases cleanly fall through to other modes or stay literal.
* GfmEmphasisStrong overrides handle() to emit two instructions on entry (emphasis_open + strong_open) and two on exit (strong_close + emphasis_close). GfmEmphasisStrongUnderscore inherits that handler — only delimiters and word-boundary rules differ.
* Sort 65 — below Strong (70) and GfmEmphasis (80) so the em+strong modes win the lexer race for `***`/`___` runs. Underscore variant is MD-preferred-only, matching the existing gating of GfmEmphasisUnderscore and GfmStrongUnderscore.
Per-mode unit tests cover basic matching, single-char bodies, whitespace flanking rejection, paragraph-boundary rejection, longer-run rejection, asymmetric rejection, multibyte intraword protection, and sort values. ModeRegistryTest's gating data provider picks up the two new rules.
Net effect on GfmSpecTest: example #476 (`***foo***`) now passes; 473/474/475/477 remain skipped as documented in skip.php.
show more ...
|
| bcefb8ae | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GFM emphasis and underscore-delimited strong modes
Three new inline formatting modes for GitHub Flavored Markdown:
GfmEmphasis `*text*` → <em> GfmEmphasisUnderscore `_text_`
add GFM emphasis and underscore-delimited strong modes
Three new inline formatting modes for GitHub Flavored Markdown:
GfmEmphasis `*text*` → <em> GfmEmphasisUnderscore `_text_` → <em> (MD-preferred only) GfmStrongUnderscore `__text__` → <strong> (MD-preferred only)
All three emit the same handler instructions as DokuWiki's Emphasis / Strong, so existing renderers need no changes.
Design notes:
* Lexer mode names use snake_case (gfm_emphasis, gfm_emphasis_underscore, gfm_strong_underscore) to keep PascalCase readable at the class level. The asterisk variant emits `emphasis_open`/`emphasis_close` via the getInstructionName() hook, so DW's Emphasis (`//...//`) and GfmEmphasis (`*...*`) can coexist in mixed modes without a lexer state collision while still producing the same <em> output.
* Underscore variants gate on Markdown-preferred syntax (`markdown`, `md+dw`) because `__` otherwise means DW underline. GfmStrongUnderscore sorts at 70 (matching Strong) — below Underline at 90 — so when loaded it wins the lexer race for `__` runs. Underline is already gated out of MD-preferred modes in the previous commit.
* Entry patterns enforce the simplified CommonMark flanking rules already shared across DW inline modes (non-whitespace adjacency, no paragraph-boundary crossing) plus the word-boundary check for underscore variants using NO_WORD_BEFORE / NO_WORD_AFTER. The positive non-word-char enumeration makes them multibyte-safe without requiring the `u` flag: `für_etwas` and `пристаням_стремятся_` correctly stay literal.
Per-mode unit tests cover basic matching, single-char bodies, leading/trailing-whitespace rejection, empty-delimiter rejection, paragraph-boundary rejection, multibyte intraword protection, and sort values. ModeRegistryTest's gating data provider picks up the three new rules.
show more ...
|
| 35f91432 | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
gate Underline on DokuWiki-preferred syntax; tidy registry plumbing
Three related changes to ModeRegistry, prep work for the Markdown modes that follow.
1. Underline (`__text__`) is moved out of lo
gate Underline on DokuWiki-preferred syntax; tidy registry plumbing
Three related changes to ModeRegistry, prep work for the Markdown modes that follow.
1. Underline (`__text__`) is moved out of loadAlwaysModes() and into loadDokuWikiModes(), gated on a new `\$dwPreferred` check that evaluates true for 'dokuwiki' and 'dw+md'. In MD-preferred settings ('markdown' and 'md+dw') `__` will mean GFM strong, so loading Underline there would conflict at the lexer level. Underline is unchanged in the default 'dokuwiki' setting.
2. resolveModeClass() now PascalCases every `_`-separated segment of the mode name, so `gfm_emphasis_underscore` resolves to `GfmEmphasisUnderscore`. Existing lowercase-compound names like `internallink` still resolve to `Internallink` (one segment, ucfirst-ed) — no behaviour change for current modes. This prepares the registry to load Gfm mode classes whose PascalCase filenames preserve word boundaries for readability.
3. ModeRegistryTest's multiple near-identical per-mode gating tests are consolidated into a single data-provider-driven testModeLoadingBySyntax, fed by a `\$rules` table that lists each mode against its four-setting expected load state. Adding a new gated mode now means one line in the provider. Currently only Underline is listed; upcoming Gfm-mode commits will add theirs.
show more ...
|
| 6b33ca93 | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add regex-primitive constants and getInstructionName() hook
Preparatory refactor for the upcoming GFM parser modes. No behaviour change for any existing mode: CONTENT_UNTIL_PARA still evaluates to t
add regex-primitive constants and getInstructionName() hook
Preparatory refactor for the upcoming GFM parser modes. No behaviour change for any existing mode: CONTENT_UNTIL_PARA still evaluates to the same regex (now factored through NOT_AT_PARA_BREAK), and getInstructionName() defaults to getModeName() so all current AbstractFormatting subclasses emit the same handler instructions as before.
AbstractMode gains four new shared regex constants:
NOT_AT_PARA_BREAK — zero-width assertion: current position is not the start of a paragraph break (blank line). Extracted from CONTENT_UNTIL_PARA for reuse in patterns that need a custom body char class.
NON_WORD_CHAR — char class: ASCII whitespace or ASCII punctuation except `_`. Multibyte-safe by construction: UTF-8 continuation bytes are >= 0x80 and thus fall outside every ASCII class, so checking positively that the surrounding context IS a non-word char correctly treats multibyte letters as word-like. No `u` flag required.
NO_WORD_BEFORE — zero-width: preceded by NON_WORD_CHAR or at start-of-input/line. For intraword-aware openers.
NO_WORD_AFTER — zero-width: followed by NON_WORD_CHAR or at end-of-input. Complement of NO_WORD_BEFORE.
AbstractFormatting gains a getInstructionName() hook that defaults to getModeName(). Subclasses that want to emit handler instructions under a different name than their lexer mode name (so a Gfm mode can share DW's `emphasis_open`/`strong_open` instructions while registering its own lexer state) override this method.
show more ...
|
| c3755410 | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
require non-whitespace adjacency for inline formatting delimiters
An opening delimiter must now be followed by a non-whitespace character, and a closing delimiter must be preceded by one. Empty deli
require non-whitespace adjacency for inline formatting delimiters
An opening delimiter must now be followed by a non-whitespace character, and a closing delimiter must be preceded by one. Empty delimiter pairs (****, ____, '''', <sub></sub>, <sup></sup>, <del></del>) no longer match and stay literal.
Rationale: this matches Markdown's flanking-delimiter rules and eliminates accidental bolding of sequences like `** note**` at the start of a sentence. Well-formed uses (**bold**, //italic//, __underline__) are unchanged.
Affected modes: Strong, Emphasis, Underline, Monospace, Subscript, Superscript, Deleted.
BREAKING: content that was already malformed but previously rendered as formatted (e.g. `**foo bar **`) now stays literal.
show more ...
|
| 10fb3d65 | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
prevent inline formatting from matching across paragraph boundaries
The Lexer compiles all patterns with the `s` (DOTALL) flag via ParallelRegex::getPerlMatchingFlags(), which makes `.` match newlin
prevent inline formatting from matching across paragraph boundaries
The Lexer compiles all patterns with the `s` (DOTALL) flag via ParallelRegex::getPerlMatchingFlags(), which makes `.` match newlines. Inline formatting modes use lookaheads like `\*\*(?=.*\*\*)` to verify a closing delimiter exists, so with DOTALL a lone `**` happily matched its "closer" many paragraphs later, swallowing blank lines into a single <strong> run.
Add CONTENT_UNTIL_PARA on AbstractMode — a regex snippet matching any character unless it would start a paragraph break (blank line, possibly with horizontal whitespace). Update all inline formatting entry patterns (Strong, Emphasis, Underline, Monospace, Subscript, Superscript, Deleted) to use it in their closing-delimiter lookaheads.
Emphasis also gets a real closing-`//` check; its previous lookahead just verified "content exists with a non-colon char" without requiring the closing delimiter at all.
Single newlines inside a delimiter pair still match (multi-line formatting); only blank lines end it.
BREAKING: This means you no longer can mark multiple paragraphs as bold or strike them out. On the other hand it prevents accidentally breaking the page layout by missing a closing delimiter (as reported many many times over the years) eg. #1025 #3588 #1056
show more ...
|
| 17c6179b | 20-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add $conf['syntax'] setting and conditional mode loading in ModeRegistry
Introduce a new 'syntax' configuration setting (dokuwiki, markdown, dw+md, md+dw) that controls which parser modes are loaded
add $conf['syntax'] setting and conditional mode loading in ModeRegistry
Introduce a new 'syntax' configuration setting (dokuwiki, markdown, dw+md, md+dw) that controls which parser modes are loaded. Built-in modes are split into always-loaded (no Markdown equivalent), DW-only, and MD-only groups. Refactor getModes() into focused sub-methods for each group.
No Gfm mode classes exist yet, so only 'dokuwiki' is functional. The change is a strict no-op for existing behavior.
show more ...
|
| 04045fea | 18-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
remove unused rewriteBlocks property from Handler
This flag was added in b7c441b9 (2005) for planned wiki syntax converters but was never set to false anywhere. Remove the dead conditional and alway
remove unused rewriteBlocks property from Handler
This flag was added in b7c441b9 (2005) for planned wiki syntax converters but was never set to false anywhere. Remove the dead conditional and always run Block processing.
show more ...
|