| #
47a02a10 |
| 04-Jun-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bun
Parsing: make parse syntax a per-parse value, drop ModeInterface
The active parse's syntax flavour is a per-parse question, not process- global state: within a single request a plugin can render bundled DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry was a singleton that read $conf['syntax'] and the $PARSER_MODES global, and every mode reached it through ModeRegistry::getInstance() — so the flavour lived in shared mutable state that two parses in one request would fight over.
Make the registry a short-lived value instead:
- ModeRegistry is constructed once per parse with an explicit $syntax and injected into Parser, Handler and every mode. getSyntax() / isDwPreferred() / isMdPreferred() consult $this->syntax; the DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is fresh, nothing to invalidate). - p_get_instructions() is now the single place in the pipeline where $conf['syntax'] is read; from there the flavour travels as a parameter. No code under inc/Parsing/ reads $conf['syntax'] directly anymore — the five syntax-reading modes (Preformatted, GfmHr, GfmEscape, Externallink, GfmQuote) route through $this->registry.
Keep the two concepts apart, as documented in the ModeRegistry and AbstractMode docblocks: the user's configured *preference* stays in $conf['syntax'] for UI code (toolbar, settings), while the active parse's syntax is a parameter carried by the registry.
$PARSER_MODES is demoted to a deprecated, read-only mirror, published during loadPluginModes() — third-party syntax plugins (columnlist, alphalist2, phpwikify, skipentity) and the bundled info plugin read the global directly, often from their constructors, so the taxonomy must stay visible there. No core code reads the mirror.
Fold ModeInterface into AbstractMode while here: getSort()/handle() are abstract, the connect callbacks carry defaults, and the public $Lexer "FIXME should be done by setter" becomes setLexer()/getLexer() injected by Parser::addMode() alongside the registry. Nested-content resolution moves to the allowedCategories()/filterAllowedModes() hooks, resolved once when the registry is attached.
Tests build their own parser/registry through ParserTestBase::setSyntax() instead of mutating $conf and calling the removed ModeRegistry::reset().
show more ...
|
| #
8ed75a23 |
| 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text<
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text</code> GfmBacktickDouble ``text`` → <code>text</code>
Both emit monospace_open and monospace_close around an unformatted() call (the same instruction shape as DokuWiki's two-single-quote pair wrapping a nowiki span), so renderers that distinguish verbatim text from plain cdata — metadata, indexer, non-XHTML backends — treat the body as literal.
GfmBacktickDouble extends GfmBacktickSingle to reuse handle() and the body-normalization helper; only the delimiter length and the body character class differ. Both share sort 165 and gate on Markdown being loaded.
Design notes:
* The lexer has no backreferences, so each length is its own mode. Length-boundary guards (?<!`)...(?!`) on every opener and closer ensure a run of two-or-more backticks is never read as an n=1 delimiter and a run of three-or-more is never read as n=2. The two modes never steal each other's input regardless of registration order — sort can't reach this kind of cross-position constraint.
* Edge-whitespace handling and newline normalization live in handle(), not in the regex. On DOKU_LEXER_UNMATCHED the body is normalized: 1. CR/LF and LF become single spaces (GFM line-ending rule). 2. If the body starts and ends with a space and is not entirely whitespace, one space is stripped from each end. That produces the right GFM output for the tricky cases without special-casing the entry pattern: ` ` → <code> </code> (all-whitespace, no strip) ` a` → <code> a</code> (asymmetric, no strip) ` `` ` → <code>``</code> (interior run-of-2 + strip) ``foo`bar`` → <code>foo`bar</code>
* Body character classes admit exactly the runs that cannot be valid closers for this mode's length: n=1 allows `[^`] | ``+`, n=2 allows `[^`] | `(?!`)`. That is what lets a single-backtick span contain a pair and a double-backtick span contain a lone backtick.
* allowedModes is empty — no other inline parsing runs inside a span.
Deliberately not implemented, with skip.php entries explaining why:
351 — code-span precedence over emphasis (*foo`*` expected to render as *foo<code>*</code>). Cross-positional: the single-pass lexer matches leftmost-first and cannot reject an earlier emphasis opener because a later backtick span would consume its closer. A proper fix would need a pre-scan pass; sort values only break ties at the same position. 353 — the trailing " outside the code span gets converted to a curly quote by DokuWiki typography, diverging from spec HTML. 354 — raw HTML tag pass-through; DokuWiki does not render raw HTML by default. 356 — GFM angle-bracket autolink <http://…>: not implemented.
Per-mode unit tests cover basic matching, flanking via the length- boundary guards, interior-run support in the body, edge-space stripping, newline normalization, all-whitespace bodies, paragraph- boundary rejection, content-is-literal, and sort values. ModeRegistryTest's gating data provider picks up both modes.
Net effect on GfmSpecTest: eleven previously-red code-span examples now pass (339, 340, 341, 342, 344, 345, 346, 347, 349, 350, 357, 359 — the simple pairs, edge-space, interior-run, newline-normalization, and mismatched-run cases). Four skipped. Three remain pending outside the code-span scope (emphasis interactions that need GfmLink once that lands).
show more ...
|