| #
47a02a10 |
| 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bun
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bundled DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry was a singleton that read $conf['syntax'] and the $PARSER_MODES global, and every mode reached it through ModeRegistry::getInstance() — so the flavour lived in shared mutable state that two parses in one request would fight over.
Make the registry a short-lived value instead:
- ModeRegistry is constructed once per parse with an explicit $syntax and injected into Parser, Handler and every mode. getSyntax() / isDwPreferred() / isMdPreferred() consult $this->syntax; the DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is fresh, nothing to invalidate). - p_get_instructions() is now the single place in the pipeline where $conf['syntax'] is read; from there the flavour travels as a parameter. No code under inc/Parsing/ reads $conf['syntax'] directly anymore — the five syntax-reading modes (Preformatted, GfmHr, GfmEscape, Externallink, GfmQuote) route through $this->registry.
Keep the two concepts apart, as documented in the ModeRegistry and AbstractMode docblocks: the user's configured *preference* stays in $conf['syntax'] for UI code (toolbar, settings), while the active parse's syntax is a parameter carried by the registry.
$PARSER_MODES is demoted to a deprecated, read-only mirror, published during loadPluginModes() — third-party syntax plugins (columnlist, alphalist2, phpwikify, skipentity) and the bundled info plugin read the global directly, often from their constructors, so the taxonomy must stay visible there. No core code reads the mirror.
Fold ModeInterface into AbstractMode while here: getSort()/handle() are abstract, the connect callbacks carry defaults, and the public $Lexer "FIXME should be done by setter" becomes setLexer()/getLexer() injected by Parser::addMode() alongside the registry. Nested-content resolution moves to the allowedCategories()/filterAllowedModes() hooks, resolved once when the registry is attached.
Tests build their own parser/registry through ParserTestBase::setSyntax() instead of mutating $conf and calling the removed ModeRegistry::reset().
show more ...
|
| #
eb15e634 |
| 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same decode post-extraction (the inline lexer never reaches their bodies). Mirrors the Helpers\Escape pattern.
Wired up in two slots:
- GfmCode info string: föö now decodes to föö in the language class. Clears spec example #330.
- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no longer terminates the URL early; the existing post-classify Escape::unescapeBackslashes call strips the backslashes after Link::classify has done its work. Clears #504, #506, #508.
Skip #328 with a self-contained title-slot reason: the URL side now decodes correctly, but the title attribute is still discarded (DokuWiki link instructions have no title slot).
show more ...
|
| #
74031e46 |
| 28-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation c
add GfmEscape for GFM backslash escapes
Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5 inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable ASCII punctuation char before competing delimiters can match. The shared character class lives on Helpers\Escape so the lexer pattern and the post-hoc unescape stay in lockstep.
Whole-span captures (GfmCode info string, GfmLink label/URL) bypass the lexer; those modes call Escape::unescapeBackslashes() on the relevant slot. GfmLink skips the unescape when the URL classifies as a windowssharelink so the leading \\host survives intact.
GfmTable cells get a separate per-cell `\|` to `|` pass in the rewriter to honour the tables-extension rule that pipes always unescape, even inside code spans where standard §6.1 escapes don't fire.
show more ...
|
| #
b1c59bed |
| 23-Apr-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and close
add GfmCode / GfmFile for fenced code blocks
GfmCode (backticks) emits the `code` handler instruction; GfmFile (tildes) emits `file`. Column-0 fences only, no length pairing between opener and closer, and unclosed fences stay literal — matching DokuWiki's `<code>` tag convention. The info string accepts DW's full attribute vocabulary (language, filename, [options]) through a new shared `Helpers::parseCodeAttributes` that `Code` also uses, with `html` aliased to `html4strict` and `-` meaning "no language".
Preformatted's indent threshold is now preference-gated: 2 spaces in DW-preferred settings, 4 spaces in MD-preferred, matching GFM's indented code block rule. A single tab is a trigger in both.
show more ...
|