| #
8ed75a23 |
| 22-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text<
add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans
Two new inline formatting modes covering GFM code spans in their n=1 and n=2 forms:
GfmBacktickSingle `text` → <code>text</code> GfmBacktickDouble ``text`` → <code>text</code>
Both emit monospace_open and monospace_close around an unformatted() call (the same instruction shape as DokuWiki's two-single-quote pair wrapping a nowiki span), so renderers that distinguish verbatim text from plain cdata — metadata, indexer, non-XHTML backends — treat the body as literal.
GfmBacktickDouble extends GfmBacktickSingle to reuse handle() and the body-normalization helper; only the delimiter length and the body character class differ. Both share sort 165 and gate on Markdown being loaded.
Design notes:
* The lexer has no backreferences, so each length is its own mode. Length-boundary guards (?<!`)...(?!`) on every opener and closer ensure a run of two-or-more backticks is never read as an n=1 delimiter and a run of three-or-more is never read as n=2. The two modes never steal each other's input regardless of registration order — sort can't reach this kind of cross-position constraint.
* Edge-whitespace handling and newline normalization live in handle(), not in the regex. On DOKU_LEXER_UNMATCHED the body is normalized: 1. CR/LF and LF become single spaces (GFM line-ending rule). 2. If the body starts and ends with a space and is not entirely whitespace, one space is stripped from each end. That produces the right GFM output for the tricky cases without special-casing the entry pattern: ` ` → <code> </code> (all-whitespace, no strip) ` a` → <code> a</code> (asymmetric, no strip) ` `` ` → <code>``</code> (interior run-of-2 + strip) ``foo`bar`` → <code>foo`bar</code>
* Body character classes admit exactly the runs that cannot be valid closers for this mode's length: n=1 allows `[^`] | ``+`, n=2 allows `[^`] | `(?!`)`. That is what lets a single-backtick span contain a pair and a double-backtick span contain a lone backtick.
* allowedModes is empty — no other inline parsing runs inside a span.
Deliberately not implemented, with skip.php entries explaining why:
351 — code-span precedence over emphasis (*foo`*` expected to render as *foo<code>*</code>). Cross-positional: the single-pass lexer matches leftmost-first and cannot reject an earlier emphasis opener because a later backtick span would consume its closer. A proper fix would need a pre-scan pass; sort values only break ties at the same position. 353 — the trailing " outside the code span gets converted to a curly quote by DokuWiki typography, diverging from spec HTML. 354 — raw HTML tag pass-through; DokuWiki does not render raw HTML by default. 356 — GFM angle-bracket autolink <http://…>: not implemented.
Per-mode unit tests cover basic matching, flanking via the length- boundary guards, interior-run support in the body, edge-space stripping, newline normalization, all-whitespace bodies, paragraph- boundary rejection, content-is-literal, and sort values. ModeRegistryTest's gating data provider picks up both modes.
Net effect on GfmSpecTest: eleven previously-red code-span examples now pass (339, 340, 341, 342, 344, 345, 346, 347, 349, 350, 357, 359 — the simple pairs, edge-space, interior-run, newline-normalization, and mismatched-run cases). Four skipped. Three remain pending outside the code-span scope (emphasis interactions that need GfmLink once that lands).
show more ...
|