| 1beb7450 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip deferred-feature spec cases
Heading-inline syntax (#36, #46) is deferred; the existing header instruction and downstream renderers would need rework to process inline modes inside
GfmSpecTest: skip deferred-feature spec cases
Heading-inline syntax (#36, #46) is deferred; the existing header instruction and downstream renderers would need rework to process inline modes inside heading content.
Bare URL autolinking without angle brackets (#619) is a deliberate DokuWiki feature in Externallink, not a feature we'll remove to match the strict CommonMark §6.8 rule.
The GFM bare-email autolink extension (#629-631) is out of scope - DokuWiki's Email mode only recognises emails inside angle brackets.
show more ...
|
| 451f2842 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
clean up markdown spec test skips
Ordered by example number, same format, single line.
Some skips now actually pass - removed. A couple others should pass but don't yet - also removed. Code fix fol
clean up markdown spec test skips
Ordered by example number, same format, single line.
Some skips now actually pass - removed. A couple others should pass but don't yet - also removed. Code fix follows.
show more ...
|
| 506762f4 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip indented-code §4.4 family and Disallowed Raw HTML #652
The 4-space indent trigger fires on paragraph-continuation lines and exits on any blank line, both of which collide with Comm
GfmSpecTest: skip indented-code §4.4 family and Disallowed Raw HTML #652
The 4-space indent trigger fires on paragraph-continuation lines and exits on any blank line, both of which collide with CommonMark §4.4 — fixing either would require paragraph-open state the single-pass lexer cannot carry. List-interior cases additionally need the column arithmetic documented as out of scope for the §2.2 tabs family.
#652 (Disallowed Raw HTML) is a filter on top of raw HTML pass-through, which DokuWiki escapes by policy (see #118-160), so it has no input.
show more ...
|
| f9d3b7bd | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme patterns share the existing conf/scheme.conf allow-list so unknown schemes fall through to literal cdata instead of being silently dropped by the renderer. Internal whitespace inside the brackets disqualifies the autolink and the whole envelope is emitted as cdata to keep the bare-URL detector off the URL.
LinksTest gains 5 cases covering success, internal-whitespace and leading-whitespace disqualification, unregistered scheme fallthrough, and the dw-only no-op path. SpecCompatRenderer URL encoder is updated to match cmark-gfm's HREF_SAFE table (square brackets and a few other characters move from safe to encoded). skip.php loses the obsolete #356 entry and gains #605/#606/#607/#609 explaining the unregistered- scheme cases that the per-scheme regex naturally rejects.
show more ...
|
| d379b737 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: neutralize DW typography for spec roundtrip
Force $conf[typography] = 0 in renderMarkdown() so the Quotes and MultiplyEntity modes are not loaded, override entity() in SpecCompatRendere
GfmSpecTest: neutralize DW typography for spec roundtrip
Force $conf[typography] = 0 in renderMarkdown() so the Quotes and MultiplyEntity modes are not loaded, override entity() in SpecCompatRenderer to emit the original match instead of the typographic glyph, and switch _xmlEntities() from ENT_QUOTES to ENT_COMPAT so `'` stays literal in body text while `"` is still escaped to ". Drops three skip entries (#308, #310, #353) that existed only to paper over the same divergence and unblocks #16, #25 and #670.
show more ...
|
| 6359e7fd | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
percent-encode URLs in SpecCompatRenderer to match spec output
CommonMark's reference renderer percent-encodes URL bytes outside the RFC 3986 unreserved/reserved set (and existing %XX sequences pass
percent-encode URLs in SpecCompatRenderer to match spec output
CommonMark's reference renderer percent-encodes URL bytes outside the RFC 3986 unreserved/reserved set (and existing %XX sequences pass through unchanged). DokuWiki's XHTML renderer leaves UTF-8 and backslashes literal in href, which is fine for live wiki output but diverges byte-for-byte from spec.
Adds specEncodeUrl() to the spec-compat renderer and applies it in specLink(). Same shape as the earlier `→`->`\t` substitution: a test-harness alignment with spec convention, no production behavior change.
Unskips #510 (backslash in URL) and #511 (entity / percent-encoding in URL); both now match spec output with the parser-side decoding from the previous commit and the renderer-side encoding here.
show more ...
|
| eb15e634 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same decode post-extraction (the inline lexer never reaches their bodies). Mirrors the Helpers\Escape pattern.
Wired up in two slots:
- GfmCode info string: föö now decodes to föö in the language class. Clears spec example #330.
- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no longer terminates the URL early; the existing post-classify Escape::unescapeBackslashes call strips the backslashes after Link::classify has done its work. Clears #504, #506, #508.
Skip #328 with a self-contained title-slot reason: the URL side now decodes correctly, but the title attribute is still discarded (DokuWiki link instructions have no title slot).
show more ...
|
| c4bcbc2e | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITI
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITION mode at sort 140, loaded under any MD-active syntax (markdown, dw+md, md+dw); pure dokuwiki is unaffected.
Reuses the existing `linebreak` handler call and renderer; no new instructions or renderer changes. SpecCompatRenderer overrides linebreak() to emit the spec's `<br />` shape. Examples 662, 663 (line break inside a raw HTML tag) are skipped — raw HTML is not passed through by default.
show more ...
|
| 3e6baeff | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax se
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax settings, mirroring the GfmQuote replacement pattern. Same `hr` handler call so renderers and the call API are unchanged.
Drops DW's old [ \t]* leading-whitespace tolerance — inert in practice past 0-1 spaces (Preformatted at sort 20 intercepts everything ≥ 2 spaces or any tab).
Spec examples 13, 20, 26-28, 224 turn green; 17, 21-24, 29, 30, 31 go to skip.php as deliberate non-implementations (whitespace tolerance and list-precedence cases).
show more ...
|
| 309a0852 | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-p
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-pass flattens sub-parsed paragraph wrapping into linebreak calls so existing pages keep their <br/>-between-lines rendering. MD-preferred keeps the <p>-wrapped spec shape.
Block content (lists, fenced code, tables) inside `>` quotes now renders, since the body is sub-parsed. Headers stay excluded (BASEONLY) — TOC and section-edit anchors don't compose with <blockquote>, same rationale as GfmListblock.
Convert ModeRegistry's sub-parser cache into an acquire/release pool to support same-key re-entrancy: a list inside a quote re-enters gfm_quote during the list-item sub-parse, and the inner call needs its own parser instance even though the exclusion key matches. GfmListblock is updated to use the new acquire/release primitives.
show more ...
|
| 685560eb | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's body is dedented to its content column and parsed by ModeRegistry::getSubParser() so block content (paragraphs, fenced code, blockquotes, plugin blocks) works inside items uniformly. Sub-parsed calls are wrapped in a Nest call before they reach the outer handler, matching the Footnote pattern: the main handler's Block rewriter treats nest as opaque and the renderer base class unwraps it transparently, so multi-paragraph items don't get double-wrapped in <p>.
Marker syntax: -, *, + (unordered) or 1-9 digits followed by . or ) (ordered). Indentation is a 2-space-multiple step starting at 0; depth = (indent / 2) + 1, odd indents round down, tabs become two spaces. The first ordered item's number drives the start attribute on <ol> via the listo_open $start parameter.
GfmLists subclasses AbstractListsRewriter with the GFM marker parser; the state machine on the base class is shared with DW Lists.
GfmListblock loads only when $conf['syntax'] is markdown or md+dw. Under those settings the DW Listblock is suppressed because the two list models conflict — DW's mandatory 2-space indent rule vs GFM's zero-indent top-level rule, and -/*/+ markers shared. Plugins that relied on Listblock loading under md+dw will see it absent there.
Sub-parser exclusion set: CATEGORY_BASEONLY (no Header inside list items) and gfm_listblock itself (defensive guard against re-entry on pathological inputs; nested lists are handled by the outer pattern, not by re-entry).
Tests cover marker variants, ordered start numbers, nested lists at two and three levels, inline formatting inside items, marker- character switches keeping one list, type switches splitting the list, fenced code inside items, multi-paragraph (loose) items, and two regressions on blank-line tolerance inside the captured block. SpecCompatRenderer learns to render the list call sequence, and spec.txt tests for digit/marker-width/lazy-continuation behavior that GfmListblock deliberately doesn't implement are documented in gfm-spec/skip.php with the per-bucket reasons (A-F).
Drops two now-obsolete entries from skip.php (image escapes that land via earlier GfmLink/GfmMedia work) and inlines the Setext explanation that previously pointed at SPEC.md. Replaces the SPEC.md reference in GfmEmphasisTest with the inline reason.
show more ...
|
| b1c59bed | 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and close
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and closer, and unclosed fences stay literal — matching DokuWiki's `<code>` tag convention. The info string accepts DW's full attribute vocabulary (language, filename, [options]) through a new shared `Helpers::parseCodeAttributes` that `Code` also uses, with `html` aliased to `html4strict` and `-` meaning "no language".
Preformatted's indent threshold is now preference-gated: 2 spaces in DW-preferred settings, 4 spaces in MD-preferred, matching GFM's indented code block rule. A single tab is a trigger in both.
show more ...
|
| 3440a8c0 | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmMedia and extend GfmLink with image-as-label form
- New GfmMedia parses `` with the full DokuWiki media-parameter vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache,
add GfmMedia and extend GfmLink with image-as-label form
- New GfmMedia parses `` with the full DokuWiki media-parameter vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache, …). Adds `?left`/`?right`/`?center` align keywords shared with DW `{{…}}` — gives pure-Markdown users a way to align inline images. - GfmLink now also matches `[](target)` — the GFM equivalent of `[[target|{{img}}]]`. Detection is post-entry, mirroring Internallink's `^{{…}}$` check; one mode covers the whole family. - LinkDispatch trait replaced by Helpers::classifyLink and Helpers::parseMediaParameters — two pure static methods, shared by DW and GFM counterparts. - Entry patterns for GfmLink / GfmMedia simplified (permissive URL slot, handle-time parsing), following DW's Internallink style. - GfmSpecTest drives a test-only SpecCompatRenderer that emits bare <img> / <a> instead of DW's wiki-wrapped HTML, recovering 13 spec tests that previously failed/skipped only because of renderer shape.
show more ...
|
| e89aeebd | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmLink for GFM inline links `[text](url)`
Extracts the URL-classification ladder from Internallink into a LinkDispatch trait so both modes route identically across all six DokuWiki link flavors
add GfmLink for GFM inline links `[text](url)`
Extracts the URL-classification ladder from Internallink into a LinkDispatch trait so both modes route identically across all six DokuWiki link flavors (internal, external, interwiki, email, windowsshare, local anchor). GfmLink parses the `[text](url)` form with optional `"title"` / `'title'` and hands the URL to the trait. The GFM title attribute is discarded — DokuWiki link instructions have no slot for it.
show more ...
|
| 8719732d | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmHeader for ATX headings (`# text` through `###### text`)
Opener must sit at column 0. GFM tolerates 0-3 spaces before the `#` but that collides with DokuWiki's 2-space-indent preformatted blo
add GfmHeader for ATX headings (`# text` through `###### text`)
Opener must sit at column 0. GFM tolerates 0-3 spaces before the `#` but that collides with DokuWiki's 2-space-indent preformatted block, so the tolerance is dropped rather than plumbed across modes.
Widen the XHTML renderer's section-node tracker from 5 slots to 6 so h6 doesn't hit "Undefined array key 5". Extend GfmSpecTest's HTML normalizer to strip DokuWiki's section-div wrappers, section-edit comments, and header id/class attributes so heading spec examples can validate semantic correctness.
show more ...
|
| 8ed75a23 | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text<
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text</code> GfmBacktickDouble ``text`` → <code>text</code>
Both emit monospace_open and monospace_close around an unformatted() call (the same instruction shape as DokuWiki's two-single-quote pair wrapping a nowiki span), so renderers that distinguish verbatim text from plain cdata — metadata, indexer, non-XHTML backends — treat the body as literal.
GfmBacktickDouble extends GfmBacktickSingle to reuse handle() and the body-normalization helper; only the delimiter length and the body character class differ. Both share sort 165 and gate on Markdown being loaded.
Design notes:
* The lexer has no backreferences, so each length is its own mode. Length-boundary guards (?<!`)...(?!`) on every opener and closer ensure a run of two-or-more backticks is never read as an n=1 delimiter and a run of three-or-more is never read as n=2. The two modes never steal each other's input regardless of registration order — sort can't reach this kind of cross-position constraint.
* Edge-whitespace handling and newline normalization live in handle(), not in the regex. On DOKU_LEXER_UNMATCHED the body is normalized: 1. CR/LF and LF become single spaces (GFM line-ending rule). 2. If the body starts and ends with a space and is not entirely whitespace, one space is stripped from each end. That produces the right GFM output for the tricky cases without special-casing the entry pattern: ` ` → <code> </code> (all-whitespace, no strip) ` a` → <code> a</code> (asymmetric, no strip) ` `` ` → <code>``</code> (interior run-of-2 + strip) ``foo`bar`` → <code>foo`bar</code>
* Body character classes admit exactly the runs that cannot be valid closers for this mode's length: n=1 allows `[^`] | ``+`, n=2 allows `[^`] | `(?!`)`. That is what lets a single-backtick span contain a pair and a double-backtick span contain a lone backtick.
* allowedModes is empty — no other inline parsing runs inside a span.
Deliberately not implemented, with skip.php entries explaining why:
351 — code-span precedence over emphasis (*foo`*` expected to render as *foo<code>*</code>). Cross-positional: the single-pass lexer matches leftmost-first and cannot reject an earlier emphasis opener because a later backtick span would consume its closer. A proper fix would need a pre-scan pass; sort values only break ties at the same position. 353 — the trailing " outside the code span gets converted to a curly quote by DokuWiki typography, diverging from spec HTML. 354 — raw HTML tag pass-through; DokuWiki does not render raw HTML by default. 356 — GFM angle-bracket autolink <http://…>: not implemented.
Per-mode unit tests cover basic matching, flanking via the length- boundary guards, interior-run support in the body, edge-space stripping, newline normalization, all-whitespace bodies, paragraph- boundary rejection, content-is-literal, and sort values. ModeRegistryTest's gating data provider picks up both modes.
Net effect on GfmSpecTest: eleven previously-red code-span examples now pass (339, 340, 341, 342, 344, 345, 346, 347, 349, 350, 357, 359 — the simple pairs, edge-space, interior-run, newline-normalization, and mismatched-run cases). Four skipped. Three remain pending outside the code-span scope (emphasis interactions that need GfmLink once that lands).
show more ...
|