| 47a02a10 | 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bun
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bundled DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry was a singleton that read $conf['syntax'] and the $PARSER_MODES global, and every mode reached it through ModeRegistry::getInstance() — so the flavour lived in shared mutable state that two parses in one request would fight over.
Make the registry a short-lived value instead:
- ModeRegistry is constructed once per parse with an explicit $syntax and injected into Parser, Handler and every mode. getSyntax() / isDwPreferred() / isMdPreferred() consult $this->syntax; the DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is fresh, nothing to invalidate). - p_get_instructions() is now the single place in the pipeline where $conf['syntax'] is read; from there the flavour travels as a parameter. No code under inc/Parsing/ reads $conf['syntax'] directly anymore — the five syntax-reading modes (Preformatted, GfmHr, GfmEscape, Externallink, GfmQuote) route through $this->registry.
Keep the two concepts apart, as documented in the ModeRegistry and AbstractMode docblocks: the user's configured *preference* stays in $conf['syntax'] for UI code (toolbar, settings), while the active parse's syntax is a parameter carried by the registry.
$PARSER_MODES is demoted to a deprecated, read-only mirror, published during loadPluginModes() — third-party syntax plugins (columnlist, alphalist2, phpwikify, skipentity) and the bundled info plugin read the global directly, often from their constructors, so the taxonomy must stay visible there. No core code reads the mirror.
Fold ModeInterface into AbstractMode while here: getSort()/handle() are abstract, the connect callbacks carry defaults, and the public $Lexer "FIXME should be done by setter" becomes setLexer()/getLexer() injected by Parser::addMode() alongside the registry. Nested-content resolution moves to the allowedCategories()/filterAllowedModes() hooks, resolved once when the registry is attached.
Tests build their own parser/registry through ParserTestBase::setSyntax() instead of mutating $conf and calling the removed ModeRegistry::reset().
show more ...
|
| 8a34b0d8 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
remove comment about failing tests now that the work is complete |
| aa346d4b | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: clear acronym table in spec renderer
The default conf/acronyms.conf entries (notably FTP) get wrapped in <abbr> by the XHTML renderer's acronym() call, which the spec output never has.
GfmSpecTest: clear acronym table in spec renderer
The default conf/acronyms.conf entries (notably FTP) get wrapped in <abbr> by the XHTML renderer's acronym() call, which the spec output never has. Clearing the renderer's acronym table makes acronym() fall through to literal text, mirroring how typography substitutions are already neutralized via SpecCompatRenderer. Brings example #628 to passing without touching production wiki rendering.
show more ...
|
| 1beb7450 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip deferred-feature spec cases
Heading-inline syntax (#36, #46) is deferred; the existing header instruction and downstream renderers would need rework to process inline modes inside
GfmSpecTest: skip deferred-feature spec cases
Heading-inline syntax (#36, #46) is deferred; the existing header instruction and downstream renderers would need rework to process inline modes inside heading content.
Bare URL autolinking without angle brackets (#619) is a deliberate DokuWiki feature in Externallink, not a feature we'll remove to match the strict CommonMark §6.8 rule.
The GFM bare-email autolink extension (#629-631) is out of scope - DokuWiki's Email mode only recognises emails inside angle brackets.
show more ...
|
| 451f2842 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
clean up markdown spec test skips
Ordered by example number, same format, single line.
Some skips now actually pass - removed. A couple others should pass but don't yet - also removed. Code fix fol
clean up markdown spec test skips
Ordered by example number, same format, single line.
Some skips now actually pass - removed. A couple others should pass but don't yet - also removed. Code fix follows.
show more ...
|
| 198d33e8 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip task list items extension (#279, #280)
GFM task list items (`- [ ] foo` / `- [x] foo`) are not implemented; the literal marker stays as the first content of the list item. |
| 506762f4 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip indented-code §4.4 family and Disallowed Raw HTML #652
The 4-space indent trigger fires on paragraph-continuation lines and exits on any blank line, both of which collide with Comm
GfmSpecTest: skip indented-code §4.4 family and Disallowed Raw HTML #652
The 4-space indent trigger fires on paragraph-continuation lines and exits on any blank line, both of which collide with CommonMark §4.4 — fixing either would require paragraph-open state the single-pass lexer cannot carry. List-interior cases additionally need the column arithmetic documented as out of scope for the §2.2 tabs family.
#652 (Disallowed Raw HTML) is a filter on top of raw HTML pass-through, which DokuWiki escapes by policy (see #118-160), so it has no input.
show more ...
|
| f9d3b7bd | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme patterns share the existing conf/scheme.conf allow-list so unknown schemes fall through to literal cdata instead of being silently dropped by the renderer. Internal whitespace inside the brackets disqualifies the autolink and the whole envelope is emitted as cdata to keep the bare-URL detector off the URL.
LinksTest gains 5 cases covering success, internal-whitespace and leading-whitespace disqualification, unregistered scheme fallthrough, and the dw-only no-op path. SpecCompatRenderer URL encoder is updated to match cmark-gfm's HREF_SAFE table (square brackets and a few other characters move from safe to encoded). skip.php loses the obsolete #356 entry and gains #605/#606/#607/#609 explaining the unregistered- scheme cases that the per-scheme regex naturally rejects.
show more ...
|
| d379b737 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: neutralize DW typography for spec roundtrip
Force $conf[typography] = 0 in renderMarkdown() so the Quotes and MultiplyEntity modes are not loaded, override entity() in SpecCompatRendere
GfmSpecTest: neutralize DW typography for spec roundtrip
Force $conf[typography] = 0 in renderMarkdown() so the Quotes and MultiplyEntity modes are not loaded, override entity() in SpecCompatRenderer to emit the original match instead of the typographic glyph, and switch _xmlEntities() from ENT_QUOTES to ENT_COMPAT so `'` stays literal in body text while `"` is still escaped to ". Drops three skip entries (#308, #310, #353) that existed only to paper over the same divergence and unblocks #16, #25 and #670.
show more ...
|
| b37c6ef7 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
more test skips |
| 6359e7fd | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
percent-encode URLs in SpecCompatRenderer to match spec output
CommonMark's reference renderer percent-encodes URL bytes outside the RFC 3986 unreserved/reserved set (and existing %XX sequences pass
percent-encode URLs in SpecCompatRenderer to match spec output
CommonMark's reference renderer percent-encodes URL bytes outside the RFC 3986 unreserved/reserved set (and existing %XX sequences pass through unchanged). DokuWiki's XHTML renderer leaves UTF-8 and backslashes literal in href, which is fine for live wiki output but diverges byte-for-byte from spec.
Adds specEncodeUrl() to the spec-compat renderer and applies it in specLink(). Same shape as the earlier `→`->`\t` substitution: a test-harness alignment with spec convention, no production behavior change.
Unskips #510 (backslash in URL) and #511 (entity / percent-encoding in URL); both now match spec output with the parser-side decoding from the previous commit and the renderer-side encoding here.
show more ...
|
| eb15e634 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same decode post-extraction (the inline lexer never reaches their bodies). Mirrors the Helpers\Escape pattern.
Wired up in two slots:
- GfmCode info string: föö now decodes to föö in the language class. Clears spec example #330.
- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no longer terminates the URL early; the existing post-classify Escape::unescapeBackslashes call strips the backslashes after Link::classify has done its work. Clears #504, #506, #508.
Skip #328 with a self-contained title-slot reason: the URL side now decodes correctly, but the title attribute is still discarded (DokuWiki link instructions have no title slot).
show more ...
|
| 09f34c31 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
apply spec convention: → represents a tab in GfmSpecTest
CommonMark spec.txt uses U+2192 RIGHTWARDS ARROW to visually mark literal tab characters in examples (see spec.txt, "About this document"). S
apply spec convention: → represents a tab in GfmSpecTest
CommonMark spec.txt uses U+2192 RIGHTWARDS ARROW to visually mark literal tab characters in examples (see spec.txt, "About this document"). Substitute → for \t in both markdown input and expected HTML so the corpus exercises real tab handling.
Surfaced by GfmNumericEntity: example #336 (	foo) now decodes the entity to a tab and produces correct output, but the harness was comparing against literal → in the expected HTML.
show more ...
|
| b414dba2 | 04-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
skip a few more spec tests
Those are all deliberately not supported cases |
| 13a62f81 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
rename syntax flavors 'dokuwiki' / 'markdown' to 'dw' / 'md'
Symmetry with the existing 'dw+md' / 'md+dw' setting values. |
| c4bcbc2e | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITI
add GfmLinebreak for GFM hard line breaks
Two-or-more trailing spaces, or a single backslash, immediately before a non-final newline render as a `<br/>`. Both delimiter forms share a single SUBSTITION mode at sort 140, loaded under any MD-active syntax (markdown, dw+md, md+dw); pure dokuwiki is unaffected.
Reuses the existing `linebreak` handler call and renderer; no new instructions or renderer changes. SpecCompatRenderer overrides linebreak() to emit the spec's `<br />` shape. Examples 662, 663 (line break inside a raw HTML tag) are skipped — raw HTML is not passed through by default.
show more ...
|
| 3e6baeff | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax se
replace DW Hr with unified GfmHr
Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_) horizontal rules; pattern self-narrows on $conf['syntax']. Always loaded across all four syntax settings, mirroring the GfmQuote replacement pattern. Same `hr` handler call so renderers and the call API are unchanged.
Drops DW's old [ \t]* leading-whitespace tolerance — inert in practice past 0-1 spaces (Preformatted at sort 20 intercepts everything ≥ 2 spaces or any tab).
Spec examples 13, 20, 26-28, 224 turn green; 17, 21-24, 29, 30, 31 go to skip.php as deliberate non-implementations (whitespace tolerance and list-precedence cases).
show more ...
|
| 309a0852 | 30-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-p
replace DW Quote with unified GfmQuote
GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects in a single mode. Same quote_open/quote_close handler instructions; a DW-preferred post-pass flattens sub-parsed paragraph wrapping into linebreak calls so existing pages keep their <br/>-between-lines rendering. MD-preferred keeps the <p>-wrapped spec shape.
Block content (lists, fenced code, tables) inside `>` quotes now renders, since the body is sub-parsed. Headers stay excluded (BASEONLY) — TOC and section-edit anchors don't compose with <blockquote>, same rationale as GfmListblock.
Convert ModeRegistry's sub-parser cache into an acquire/release pool to support same-key re-entrancy: a list inside a quote re-enters gfm_quote during the list-item sub-parse, and the inner call needs its own parser instance even though the exclusion key matches. GfmListblock is updated to use the new acquire/release primitives.
show more ...
|
| f7c6e4ac | 30-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add listo_open_start sibling method for GFM start numbers
Reverts the listo_open signature widening from 5a2118acc and instead adds a sibling method `listo_open_start($start = 1)` on the renderer hi
add listo_open_start sibling method for GFM start numbers
Reverts the listo_open signature widening from 5a2118acc and instead adds a sibling method `listo_open_start($start = 1)` on the renderer hierarchy. The base default delegates to listo_open() so renderers that don't override it still produce a valid (but unnumbered) list; xhtml's override emits <ol start="N">.
The handler now emits 'listo_open_start' only for ordered lists with a non-default first number; plain ordered lists keep emitting the unchanged 'listo_open' instruction. This preserves the historical listo_open / listu_open signatures (zero-arg base, $classes-only xhtml form from 2016) so the 17 plugin renderers found via codesearch keep working without modification, while still implementing GFM's "5. foo" -> <ol start="5"> rule.
show more ...
|
| 74031e46 | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation c
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation char before competing delimiters can match. The shared character class lives on Helpers\Escape so the lexer pattern and the post-hoc unescape stay in lockstep.
Whole-span captures (GfmCode info string, GfmLink label/URL) bypass the lexer; those modes call Escape::unescapeBackslashes() on the relevant slot. GfmLink skips the unescape when the URL classifies as a windowssharelink so the leading \\host survives intact.
GfmTable cells get a separate per-cell `\|` to `|` pass in the rewriter to honour the tables-extension rule that pipes always unescape, even inside code spans where standard §6.1 escapes don't fire.
show more ...
|
| 3dabe4e0 | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmTable for GFM tables
Implements the GFM pipe-table extension as a CONTAINER mode at sort 55, one below DW Table at 60. A lookahead-validated entry pattern asserts a header line plus a `:?-+:?
add GfmTable for GFM tables
Implements the GFM pipe-table extension as a CONTAINER mode at sort 55, one below DW Table at 60. A lookahead-validated entry pattern asserts a header line plus a `:?-+:?` delimiter row before consuming any input, so non-table paragraphs containing pipes flow through unchanged. Cells are inline-only per spec.
Handler\GfmTable rewrites the flat token stream into the canonical table_open / tablethead_* / tabletbody_* / table_close sequence, deriving per-column alignment from the delimiter row, padding short body rows (spec 202), truncating long ones (spec 204), and falling back to a single cdata when the column count mismatches (spec 203).
`tabletbody_open` / `tabletbody_close` are emitted for the first time; they are part of the base renderer API but DW Table never used them. Added to Block's blockOpen / blockClose lists alongside `tabletfoot_*` for symmetry. SpecCompatRenderer gains minimal table-element overrides so spec roundtrip output matches GFM's `<table><thead><tr><th>` shape without DW's wrapper div, row/col counter classes, or align-as-class.
show more ...
|
| 685560eb | 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's
add GfmListblock for GFM lists
GfmListblock captures an entire list block atomically with one addSpecialPattern match, then walks the captured text in handle() grouping lines into items. Each item's body is dedented to its content column and parsed by ModeRegistry::getSubParser() so block content (paragraphs, fenced code, blockquotes, plugin blocks) works inside items uniformly. Sub-parsed calls are wrapped in a Nest call before they reach the outer handler, matching the Footnote pattern: the main handler's Block rewriter treats nest as opaque and the renderer base class unwraps it transparently, so multi-paragraph items don't get double-wrapped in <p>.
Marker syntax: -, *, + (unordered) or 1-9 digits followed by . or ) (ordered). Indentation is a 2-space-multiple step starting at 0; depth = (indent / 2) + 1, odd indents round down, tabs become two spaces. The first ordered item's number drives the start attribute on <ol> via the listo_open $start parameter.
GfmLists subclasses AbstractListsRewriter with the GFM marker parser; the state machine on the base class is shared with DW Lists.
GfmListblock loads only when $conf['syntax'] is markdown or md+dw. Under those settings the DW Listblock is suppressed because the two list models conflict — DW's mandatory 2-space indent rule vs GFM's zero-indent top-level rule, and -/*/+ markers shared. Plugins that relied on Listblock loading under md+dw will see it absent there.
Sub-parser exclusion set: CATEGORY_BASEONLY (no Header inside list items) and gfm_listblock itself (defensive guard against re-entry on pathological inputs; nested lists are handled by the outer pattern, not by re-entry).
Tests cover marker variants, ordered start numbers, nested lists at two and three levels, inline formatting inside items, marker- character switches keeping one list, type switches splitting the list, fenced code inside items, multi-paragraph (loose) items, and two regressions on blank-line tolerance inside the captured block. SpecCompatRenderer learns to render the list call sequence, and spec.txt tests for digit/marker-width/lazy-continuation behavior that GfmListblock deliberately doesn't implement are documented in gfm-spec/skip.php with the per-bucket reasons (A-F).
Drops two now-obsolete entries from skip.php (image escapes that land via earlier GfmLink/GfmMedia work) and inlines the Setext explanation that previously pointed at SPEC.md. Replaces the SPEC.md reference in GfmEmphasisTest with the inline reason.
show more ...
|
| b1c59bed | 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and close
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and closer, and unclosed fences stay literal — matching DokuWiki's `<code>` tag convention. The info string accepts DW's full attribute vocabulary (language, filename, [options]) through a new shared `Helpers::parseCodeAttributes` that `Code` also uses, with `html` aliased to `html4strict` and `-` meaning "no language".
Preformatted's indent threshold is now preference-gated: 2 spaces in DW-preferred settings, 4 spaces in MD-preferred, matching GFM's indented code block rule. A single tab is a trigger in both.
show more ...
|
| 3440a8c0 | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmMedia and extend GfmLink with image-as-label form
- New GfmMedia parses `` with the full DokuWiki media-parameter vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache,
add GfmMedia and extend GfmLink with image-as-label form
- New GfmMedia parses `` with the full DokuWiki media-parameter vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache, …). Adds `?left`/`?right`/`?center` align keywords shared with DW `{{…}}` — gives pure-Markdown users a way to align inline images. - GfmLink now also matches `[](target)` — the GFM equivalent of `[[target|{{img}}]]`. Detection is post-entry, mirroring Internallink's `^{{…}}$` check; one mode covers the whole family. - LinkDispatch trait replaced by Helpers::classifyLink and Helpers::parseMediaParameters — two pure static methods, shared by DW and GFM counterparts. - Entry patterns for GfmLink / GfmMedia simplified (permissive URL slot, handle-time parsing), following DW's Internallink style. - GfmSpecTest drives a test-only SpecCompatRenderer that emits bare <img> / <a> instead of DW's wiki-wrapped HTML, recovering 13 spec tests that previously failed/skipped only because of renderer shape.
show more ...
|
| e89aeebd | 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmLink for GFM inline links `[text](url)`
Extracts the URL-classification ladder from Internallink into a LinkDispatch trait so both modes route identically across all six DokuWiki link flavors
add GfmLink for GFM inline links `[text](url)`
Extracts the URL-classification ladder from Internallink into a LinkDispatch trait so both modes route identically across all six DokuWiki link flavors (internal, external, interwiki, email, windowsshare, local anchor). GfmLink parses the `[text](url)` form with optional `"title"` / `'title'` and hands the URL to the trait. The GFM title attribute is discarded — DokuWiki link instructions have no slot for it.
show more ...
|