History log of /dokuwiki/inc/Parsing/ (Results 26 – 50 of 87)
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
309a085230-Apr-2026 Andreas Gohr <andi@splitbrain.org>

replace DW Quote with unified GfmQuote

GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects
in a single mode. Same quote_open/quote_close handler instructions; a
DW-preferred post-p

replace DW Quote with unified GfmQuote

GfmQuote covers blockquote parsing for both DokuWiki and GFM dialects
in a single mode. Same quote_open/quote_close handler instructions; a
DW-preferred post-pass flattens sub-parsed paragraph wrapping into
linebreak calls so existing pages keep their <br/>-between-lines
rendering. MD-preferred keeps the <p>-wrapped spec shape.

Block content (lists, fenced code, tables) inside `>` quotes now
renders, since the body is sub-parsed. Headers stay excluded
(BASEONLY) — TOC and section-edit anchors don't compose with
<blockquote>, same rationale as GfmListblock.

Convert ModeRegistry's sub-parser cache into an acquire/release pool
to support same-key re-entrancy: a list inside a quote re-enters
gfm_quote during the list-item sub-parse, and the inner call needs
its own parser instance even though the exclusion key matches.
GfmListblock is updated to use the new acquire/release primitives.

show more ...

f7c6e4ac30-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add listo_open_start sibling method for GFM start numbers

Reverts the listo_open signature widening from 5a2118acc and instead
adds a sibling method `listo_open_start($start = 1)` on the renderer
hi

add listo_open_start sibling method for GFM start numbers

Reverts the listo_open signature widening from 5a2118acc and instead
adds a sibling method `listo_open_start($start = 1)` on the renderer
hierarchy. The base default delegates to listo_open() so renderers
that don't override it still produce a valid (but unnumbered) list;
xhtml's override emits <ol start="N">.

The handler now emits 'listo_open_start' only for ordered lists with
a non-default first number; plain ordered lists keep emitting the
unchanged 'listo_open' instruction. This preserves the historical
listo_open / listu_open signatures (zero-arg base, $classes-only
xhtml form from 2016) so the 17 plugin renderers found via
codesearch keep working without modification, while still
implementing GFM's "5. foo" -> <ol start="5"> rule.

show more ...

74031e4628-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmEscape for GFM backslash escapes

Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5
inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable
ASCII punctuation c

add GfmEscape for GFM backslash escapes

Implements GFM §6.1 backslash-escape handling. GfmEscape is a sort-5
inline mode in CATEGORY_SUBSTITION that claims `\X` for any escapable
ASCII punctuation char before competing delimiters can match. The
shared character class lives on Helpers\Escape so the lexer pattern
and the post-hoc unescape stay in lockstep.

Whole-span captures (GfmCode info string, GfmLink label/URL) bypass
the lexer; those modes call Escape::unescapeBackslashes() on the
relevant slot. GfmLink skips the unescape when the URL classifies as
a windowssharelink so the leading \\host survives intact.

GfmTable cells get a separate per-cell `\|` to `|` pass in the
rewriter to honour the tables-extension rule that pipes always
unescape, even inside code spans where standard §6.1 escapes don't
fire.

show more ...

3dabe4e028-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmTable for GFM tables

Implements the GFM pipe-table extension as a CONTAINER mode at sort 55,
one below DW Table at 60. A lookahead-validated entry pattern asserts a
header line plus a `:?-+:?

add GfmTable for GFM tables

Implements the GFM pipe-table extension as a CONTAINER mode at sort 55,
one below DW Table at 60. A lookahead-validated entry pattern asserts a
header line plus a `:?-+:?` delimiter row before consuming any input, so
non-table paragraphs containing pipes flow through unchanged. Cells are
inline-only per spec.

Handler\GfmTable rewrites the flat token stream into the canonical
table_open / tablethead_* / tabletbody_* / table_close sequence, deriving
per-column alignment from the delimiter row, padding short body rows
(spec 202), truncating long ones (spec 204), and falling back to a single
cdata when the column count mismatches (spec 203).

`tabletbody_open` / `tabletbody_close` are emitted for the first time;
they are part of the base renderer API but DW Table never used them.
Added to Block's blockOpen / blockClose lists alongside `tabletfoot_*`
for symmetry. SpecCompatRenderer gains minimal table-element overrides
so spec roundtrip output matches GFM's `<table><thead><tr><th>` shape
without DW's wrapper div, row/col counter classes, or align-as-class.

show more ...

685560eb28-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmListblock for GFM lists

GfmListblock captures an entire list block atomically with one
addSpecialPattern match, then walks the captured text in handle()
grouping lines into items. Each item's

add GfmListblock for GFM lists

GfmListblock captures an entire list block atomically with one
addSpecialPattern match, then walks the captured text in handle()
grouping lines into items. Each item's body is dedented to its
content column and parsed by ModeRegistry::getSubParser() so
block content (paragraphs, fenced code, blockquotes, plugin
blocks) works inside items uniformly. Sub-parsed calls are wrapped
in a Nest call before they reach the outer handler, matching the
Footnote pattern: the main handler's Block rewriter treats nest
as opaque and the renderer base class unwraps it transparently,
so multi-paragraph items don't get double-wrapped in <p>.

Marker syntax: -, *, + (unordered) or 1-9 digits followed by
. or ) (ordered). Indentation is a 2-space-multiple step starting
at 0; depth = (indent / 2) + 1, odd indents round down, tabs become
two spaces. The first ordered item's number drives the start
attribute on <ol> via the listo_open $start parameter.

GfmLists subclasses AbstractListsRewriter with the GFM marker
parser; the state machine on the base class is shared with DW Lists.

GfmListblock loads only when $conf['syntax'] is markdown or md+dw.
Under those settings the DW Listblock is suppressed because the two
list models conflict — DW's mandatory 2-space indent rule vs GFM's
zero-indent top-level rule, and -/*/+ markers shared. Plugins that
relied on Listblock loading under md+dw will see it absent there.

Sub-parser exclusion set: CATEGORY_BASEONLY (no Header inside list
items) and gfm_listblock itself (defensive guard against re-entry
on pathological inputs; nested lists are handled by the outer
pattern, not by re-entry).

Tests cover marker variants, ordered start numbers, nested lists at
two and three levels, inline formatting inside items, marker-
character switches keeping one list, type switches splitting the
list, fenced code inside items, multi-paragraph (loose) items, and
two regressions on blank-line tolerance inside the captured block.
SpecCompatRenderer learns to render the list call sequence, and
spec.txt tests for digit/marker-width/lazy-continuation behavior
that GfmListblock deliberately doesn't implement are documented in
gfm-spec/skip.php with the per-bucket reasons (A-F).

Drops two now-obsolete entries from skip.php (image escapes that
land via earlier GfmLink/GfmMedia work) and inlines the Setext
explanation that previously pointed at SPEC.md. Replaces the
SPEC.md reference in GfmEmphasisTest with the inline reason.

show more ...

9172eccf28-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add sub-parser support to Handler / Parser / ModeRegistry

A block mode that wants to parse the body of one of its captured
matches needs a second Parser instance configured with the active
modes min

add sub-parser support to Handler / Parser / ModeRegistry

A block mode that wants to parse the body of one of its captured
matches needs a second Parser instance configured with the active
modes minus whatever would re-enter the outer mode. Doing this by
hand is verbose and easy to get wrong — modes hold a $Lexer slot
that addMode() overwrites, so the same mode object can't be shared
between the main parser and a sub-parser.

Three small additions:

Handler::reset() — clears calls, status, currentModeName, and
installs a fresh CallWriter. Lets one Handler instance be parsed
against repeatedly without state bleed.

Parser::getHandler() — accessor; sub-parser callers need it to
reach the handler for reset() and for harvesting the produced
call list.

ModeRegistry::getSubParser($excludeCategories, $excludeModes) —
returns a cached Parser preconfigured with every active mode
except those excluded. Mode objects are cloned before being
attached so connectTo()'s assignment to $Lexer does not clobber
the main parser's references. Cache key is the exclusion-set;
default exclusion is CATEGORY_BASEONLY (no Header inside the
sub-parsed content).

Tests cover Handler::reset's full clear, sub-parser caching,
default and custom exclusions, registry-reset propagation, and
the clone-not-share invariant for $Lexer.

show more ...

bf6e4f0d28-Apr-2026 Andreas Gohr <andi@splitbrain.org>

extract AbstractListsRewriter from Lists

The list-block CallWriter rewriter mixed two concerns: a shared state
machine that turns flat list_open / list_item / list_close calls into
the nested listu_

extract AbstractListsRewriter from Lists

The list-block CallWriter rewriter mixed two concerns: a shared state
machine that turns flat list_open / list_item / list_close calls into
the nested listu_open / listo_open / listitem / listcontent shape
the renderers expect, and a syntax-specific marker parser that maps
the captured indent + marker text to depth/type.

Hoist the state machine onto a new abstract base class
AbstractListsRewriter; Lists keeps only its DokuWiki marker parser
(`*` unordered, `-` ordered, 2-space-per-level indent). The upcoming
GfmLists will share the same base class with its own GFM marker
parser.

The interpretSyntax contract changes shape:

protected function interpretSyntax($match, &$type): int
to
abstract protected function interpretSyntax(string $match): array;
// returns ['depth' => int, 'type' => 'u'|'o', 'start'? => int]

The optional `start` key carries the first ordered item's number for
syntaxes that support it (GFM); DokuWiki omits it and gets the
default of 1. Plugins subclassing Lists that override interpretSyntax
need to update — known affected: creole, markdowku, mediasyntax (per
codesearch.dokuwiki.org). The migration is mechanical: replace the
by-ref $type assignment and int return with an associative-array
return.

The protected listStart / listOpen / listEnd dispatch methods on the
old Lists are gone (renamed handleListOpen / handleListItem /
handleListClose on the base class), but no plugin in the ecosystem
overrides those, only interpretSyntax.

show more ...

96d096f127-Apr-2026 Andreas Gohr <andi@splitbrain.org>

remove getLineStartMarkers registry — sort order already wins

Preformatted's entry pattern carried a `(?![\*\-])` negative
lookahead to defer to list modes on indented bullet lines.
0cecf9d50 (2005,

remove getLineStartMarkers registry — sort order already wins

Preformatted's entry pattern carried a `(?![\*\-])` negative
lookahead to defer to list modes on indented bullet lines.
0cecf9d50 (2005, "new parser added") introduced it hardcoded;
7958e6980 (2026, "decouple hardcoded mode names in Eol and
Preformatted") refactored that hardcoded knowledge into
register/getLineStartMarkers on ModeRegistry so each list mode
owned its marker chars. Both preserved the behavior verbatim;
neither documented why it was needed.

Tracing the lexer, it isn't. ParallelRegex merges all entry
patterns into one PCRE expression; PCRE returns the leftmost
match and breaks ties on expression order. Modes are added in
sort order via ModeRegistry::getModes(), so Listblock (sort 10)
always precedes Preformatted (sort 20) and wins the tie on
" - foo" without any lookahead. The only test that caught a
difference was testPreformattedList, which happened to register
modes in non-canonical order - that was a test bug.

This patch drops the lookahead in Preformatted::connectTo, the
registerLineStartMarkers call in Listblock::preConnect, the
register/getLineStartMarkers methods on ModeRegistry, and the
three registry-API unit tests. testPreformattedList now
registers Listblock before Preformatted.

show more ...

1e28e40623-Apr-2026 Andreas Gohr <andi@splitbrain.org>

split Parsing\Helpers into per-domain Link / Media / Code classes

781f5c7123-Apr-2026 Andreas Gohr <andi@splitbrain.org>

gate monospace, unformatted, file on DokuWiki syntax

These DokuWiki specific modes should only be loaded when DokuWiki syntax
is still wanted, not in Markdown-only mode.
Expands the ModeRegistryTest

gate monospace, unformatted, file on DokuWiki syntax

These DokuWiki specific modes should only be loaded when DokuWiki syntax
is still wanted, not in Markdown-only mode.
Expands the ModeRegistryTest data provider to cover the full always-loaded
and DW-always sets.

show more ...

b1c59bed23-Apr-2026 Andreas Gohr <andi@splitbrain.org>

add GfmCode / GfmFile for fenced code blocks

GfmCode (backticks) emits the `code` handler instruction; GfmFile
(tildes) emits `file`. Column-0 fences only, no length pairing
between opener and close

add GfmCode / GfmFile for fenced code blocks

GfmCode (backticks) emits the `code` handler instruction; GfmFile
(tildes) emits `file`. Column-0 fences only, no length pairing
between opener and closer, and unclosed fences stay literal —
matching DokuWiki's `<code>` tag convention. The info string accepts
DW's full attribute vocabulary (language, filename, [options])
through a new shared `Helpers::parseCodeAttributes` that `Code`
also uses, with `html` aliased to `html4strict` and `-` meaning "no
language".

Preformatted's indent threshold is now preference-gated: 2 spaces
in DW-preferred settings, 4 spaces in MD-preferred, matching GFM's
indented code block rule. A single tab is a trigger in both.

show more ...

3440a8c022-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GfmMedia and extend GfmLink with image-as-label form

- New GfmMedia parses `![alt](url)` with the full DokuWiki media-parameter
vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache,

add GfmMedia and extend GfmLink with image-as-label form

- New GfmMedia parses `![alt](url)` with the full DokuWiki media-parameter
vocabulary in the URL slot (?100x200, ?right, ?nolink, ?recache, …).
Adds `?left`/`?right`/`?center` align keywords shared with DW `{{…}}`
— gives pure-Markdown users a way to align inline images.
- GfmLink now also matches `[![alt](img)](target)` — the GFM equivalent
of `[[target|{{img}}]]`. Detection is post-entry, mirroring
Internallink's `^{{…}}$` check; one mode covers the whole family.
- LinkDispatch trait replaced by Helpers::classifyLink and
Helpers::parseMediaParameters — two pure static methods, shared by
DW and GFM counterparts.
- Entry patterns for GfmLink / GfmMedia simplified (permissive URL slot,
handle-time parsing), following DW's Internallink style.
- GfmSpecTest drives a test-only SpecCompatRenderer that emits bare
<img> / <a> instead of DW's wiki-wrapped HTML, recovering 13 spec
tests that previously failed/skipped only because of renderer shape.

show more ...

e89aeebd22-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GfmLink for GFM inline links `[text](url)`

Extracts the URL-classification ladder from Internallink into a
LinkDispatch trait so both modes route identically across all six
DokuWiki link flavors

add GfmLink for GFM inline links `[text](url)`

Extracts the URL-classification ladder from Internallink into a
LinkDispatch trait so both modes route identically across all six
DokuWiki link flavors (internal, external, interwiki, email,
windowsshare, local anchor). GfmLink parses the `[text](url)` form
with optional `"title"` / `'title'` and hands the URL to the trait.
The GFM title attribute is discarded — DokuWiki link instructions
have no slot for it.

show more ...

8719732d22-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GfmHeader for ATX headings (`# text` through `###### text`)

Opener must sit at column 0. GFM tolerates 0-3 spaces before the `#`
but that collides with DokuWiki's 2-space-indent preformatted blo

add GfmHeader for ATX headings (`# text` through `###### text`)

Opener must sit at column 0. GFM tolerates 0-3 spaces before the `#`
but that collides with DokuWiki's 2-space-indent preformatted block,
so the tolerance is dropped rather than plumbed across modes.

Widen the XHTML renderer's section-node tracker from 5 slots to 6 so
h6 doesn't hit "Undefined array key 5". Extend GfmSpecTest's HTML
normalizer to strip DokuWiki's section-div wrappers, section-edit
comments, and header id/class attributes so heading spec examples
can validate semantic correctness.

show more ...

8ed75a2322-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans

Two new inline formatting modes covering GFM code spans in their n=1
and n=2 forms:

GfmBacktickSingle `text` → <code>text<

add GfmBacktickSingle / GfmBacktickDouble for GFM inline code spans

Two new inline formatting modes covering GFM code spans in their n=1
and n=2 forms:

GfmBacktickSingle `text` → <code>text</code>
GfmBacktickDouble ``text`` → <code>text</code>

Both emit monospace_open and monospace_close around an unformatted()
call (the same instruction shape as DokuWiki's two-single-quote pair
wrapping a nowiki span), so renderers that distinguish verbatim text
from plain cdata — metadata, indexer, non-XHTML backends — treat the
body as literal.

GfmBacktickDouble extends GfmBacktickSingle to reuse handle() and the
body-normalization helper; only the delimiter length and the body
character class differ. Both share sort 165 and gate on Markdown
being loaded.

Design notes:

* The lexer has no backreferences, so each length is its own mode.
Length-boundary guards (?<!`)...(?!`) on every opener and closer
ensure a run of two-or-more backticks is never read as an n=1
delimiter and a run of three-or-more is never read as n=2. The two
modes never steal each other's input regardless of registration
order — sort can't reach this kind of cross-position constraint.

* Edge-whitespace handling and newline normalization live in handle(),
not in the regex. On DOKU_LEXER_UNMATCHED the body is normalized:
1. CR/LF and LF become single spaces (GFM line-ending rule).
2. If the body starts and ends with a space and is not entirely
whitespace, one space is stripped from each end.
That produces the right GFM output for the tricky cases without
special-casing the entry pattern:
` ` → <code> </code> (all-whitespace, no strip)
` a` → <code> a</code> (asymmetric, no strip)
` `` ` → <code>``</code> (interior run-of-2 + strip)
``foo`bar`` → <code>foo`bar</code>

* Body character classes admit exactly the runs that cannot be valid
closers for this mode's length: n=1 allows `[^`] | ``+`, n=2 allows
`[^`] | `(?!`)`. That is what lets a single-backtick span contain
a pair and a double-backtick span contain a lone backtick.

* allowedModes is empty — no other inline parsing runs inside a span.

Deliberately not implemented, with skip.php entries explaining why:

351 — code-span precedence over emphasis (*foo`*` expected to render
as *foo<code>*</code>). Cross-positional: the single-pass
lexer matches leftmost-first and cannot reject an earlier
emphasis opener because a later backtick span would consume
its closer. A proper fix would need a pre-scan pass; sort
values only break ties at the same position.
353 — the trailing " outside the code span gets converted to a
curly quote by DokuWiki typography, diverging from spec HTML.
354 — raw HTML tag pass-through; DokuWiki does not render raw HTML
by default.
356 — GFM angle-bracket autolink <http://…>: not implemented.

Per-mode unit tests cover basic matching, flanking via the length-
boundary guards, interior-run support in the body, edge-space
stripping, newline normalization, all-whitespace bodies, paragraph-
boundary rejection, content-is-literal, and sort values.
ModeRegistryTest's gating data provider picks up both modes.

Net effect on GfmSpecTest: eleven previously-red code-span examples
now pass (339, 340, 341, 342, 344, 345, 346, 347, 349, 350, 357, 359
— the simple pairs, edge-space, interior-run, newline-normalization,
and mismatched-run cases). Four skipped. Three remain pending outside
the code-span scope (emphasis interactions that need GfmLink once
that lands).

show more ...

864d6c6d21-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

fix Lexer so exit-pattern lookbehinds see chars consumed by prior tokens

Lexer::reduce used to hand PCRE a shrinking tail of the subject — each
matched token was chopped off the front of $raw and th

fix Lexer so exit-pattern lookbehinds see chars consumed by prior tokens

Lexer::reduce used to hand PCRE a shrinking tail of the subject — each
matched token was chopped off the front of $raw and the next preg_match
ran on what remained. Once a token was consumed, the bytes before the
cursor were gone, and any lookbehind assertion in a subsequent pattern
silently failed.

The bug was latent for DokuWiki's entire history because literal exit
patterns like `\*\*`, `</file>`, or `%%` don't care what's behind them.
It surfaced with c3755410a ("require non-whitespace adjacency for
inline formatting delimiters"), which added `(?<=[^\s])` to Strong,
Emphasis, Underline, Monospace, Subscript, Superscript and Deleted at
once. After that commit, `**[[link]]**` stopped closing — the `]` that
would satisfy the lookbehind had just been consumed by the link match,
so Strong stayed open until end-of-section and swallowed everything
after it (list items, headings, the lot).

Fix:

* Lexer::parse and Lexer::reduce track a byte offset into $raw instead
of mutating $raw. $initialLength and the shrinking-length arithmetic
for absolute match positions are replaced by straight offset
increments; the no-progress guard and the trailing-unmatched dispatch
both shift to the same cursor.

* ParallelRegex::split takes an optional $offset and passes it to
preg_match together with PREG_OFFSET_CAPTURE. PCRE scans from the
offset forward but still sees the whole subject, so lookbehinds work
across already-consumed tokens. The secondary preg_split call used
to carve out pre/post is no longer needed — PREG_OFFSET_CAPTURE
gives the match start for free, saving one regex operation per
reduce() step.

Regression tests at all three layers:

* ParallelRegexTest — offset plumbing and pre/match accounting.
* LexerTest::testIndexLookbehindAcrossConsumedToken — exit-pattern
lookbehind targeting the `/>` of a self-closing `<a/>` that was
consumed as a SPECIAL token on the previous step. Fails under the
old Lexer.
* FormattingTest — `**[[link]]**` and `**foo//bar//**` round-trip
with correct open/close instructions through the full pipeline.

Also updates ListsTest::testUnorderedListStrong, whose expectations
documented the pre-fix buggy behaviour ("formatting able to spread
across list items"). With the fix, bold correctly stays within a
single list item; the expected call sequence and the comment are
updated to match.

show more ...

0244be5c21-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GfmDeleted mode for GFM strikethrough (`~~text~~`)

Shares the deleted_open/deleted_close instructions with DW's <del> mode.
Entry/exit anchors `(?<!~)` / `(?!~)` reject runs of three or more til

add GfmDeleted mode for GFM strikethrough (`~~text~~`)

Shares the deleted_open/deleted_close instructions with DW's <del> mode.
Entry/exit anchors `(?<!~)` / `(?!~)` reject runs of three or more tildes
so fenced-code markers remain untouched. Also trim redundant class-level
docblocks on sibling Gfm test files.

show more ...

2bb62bca20-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GFM em-wrapping-strong modes for `***foo***` / `___foo___`

Two new inline formatting modes that render triple-delimiter runs as
em wrapping strong:

GfmEmphasisStrong `***text***`

add GFM em-wrapping-strong modes for `***foo***` / `___foo___`

Two new inline formatting modes that render triple-delimiter runs as
em wrapping strong:

GfmEmphasisStrong `***text***` → <em><strong>text</strong></em>
GfmEmphasisStrongUnderscore `___text___` → same (MD-preferred only)

Only the exact 3+3 symmetric case is handled. The other long-run and
asymmetric variants (4+4, 5+5, `***foo**`, etc.) require CommonMark's
stack-based delimiter-pairing algorithm with its flanking and
multiple-of-3 rules, which is explicitly out of scope; those examples
stay skipped in gfm-spec/skip.php.

Implementation notes:

* Patterns enforce exact 3+3 via `(?<!\*)` / `(?<!_)` lookbehinds
(preventing entry at the second `*` of a `****...` run) and
`(?!\*)` / `(?!_)` lookaheads after the closing triple (rejecting
`***foo****` etc.). Combined with the existing non-whitespace
adjacency lookaheads, all asymmetric cases cleanly fall through to
other modes or stay literal.

* GfmEmphasisStrong overrides handle() to emit two instructions on
entry (emphasis_open + strong_open) and two on exit (strong_close
+ emphasis_close). GfmEmphasisStrongUnderscore inherits that
handler — only delimiters and word-boundary rules differ.

* Sort 65 — below Strong (70) and GfmEmphasis (80) so the em+strong
modes win the lexer race for `***`/`___` runs. Underscore variant
is MD-preferred-only, matching the existing gating of
GfmEmphasisUnderscore and GfmStrongUnderscore.

Per-mode unit tests cover basic matching, single-char bodies,
whitespace flanking rejection, paragraph-boundary rejection,
longer-run rejection, asymmetric rejection, multibyte intraword
protection, and sort values. ModeRegistryTest's gating data provider
picks up the two new rules.

Net effect on GfmSpecTest: example #476 (`***foo***`) now passes;
473/474/475/477 remain skipped as documented in skip.php.

show more ...

bcefb8ae20-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add GFM emphasis and underscore-delimited strong modes

Three new inline formatting modes for GitHub Flavored Markdown:

GfmEmphasis `*text*` → <em>
GfmEmphasisUnderscore `_text_`

add GFM emphasis and underscore-delimited strong modes

Three new inline formatting modes for GitHub Flavored Markdown:

GfmEmphasis `*text*` → <em>
GfmEmphasisUnderscore `_text_` → <em> (MD-preferred only)
GfmStrongUnderscore `__text__` → <strong> (MD-preferred only)

All three emit the same handler instructions as DokuWiki's Emphasis /
Strong, so existing renderers need no changes.

Design notes:

* Lexer mode names use snake_case (gfm_emphasis, gfm_emphasis_underscore,
gfm_strong_underscore) to keep PascalCase readable at the class level.
The asterisk variant emits `emphasis_open`/`emphasis_close` via the
getInstructionName() hook, so DW's Emphasis (`//...//`) and
GfmEmphasis (`*...*`) can coexist in mixed modes without a lexer
state collision while still producing the same <em> output.

* Underscore variants gate on Markdown-preferred syntax (`markdown`,
`md+dw`) because `__` otherwise means DW underline. GfmStrongUnderscore
sorts at 70 (matching Strong) — below Underline at 90 — so when loaded
it wins the lexer race for `__` runs. Underline is already gated out
of MD-preferred modes in the previous commit.

* Entry patterns enforce the simplified CommonMark flanking rules
already shared across DW inline modes (non-whitespace adjacency,
no paragraph-boundary crossing) plus the word-boundary check for
underscore variants using NO_WORD_BEFORE / NO_WORD_AFTER. The
positive non-word-char enumeration makes them multibyte-safe without
requiring the `u` flag: `für_etwas` and `пристаням_стремятся_`
correctly stay literal.

Per-mode unit tests cover basic matching, single-char bodies,
leading/trailing-whitespace rejection, empty-delimiter rejection,
paragraph-boundary rejection, multibyte intraword protection, and
sort values. ModeRegistryTest's gating data provider picks up the
three new rules.

show more ...

35f9143220-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

gate Underline on DokuWiki-preferred syntax; tidy registry plumbing

Three related changes to ModeRegistry, prep work for the Markdown modes
that follow.

1. Underline (`__text__`) is moved out of lo

gate Underline on DokuWiki-preferred syntax; tidy registry plumbing

Three related changes to ModeRegistry, prep work for the Markdown modes
that follow.

1. Underline (`__text__`) is moved out of loadAlwaysModes() and into
loadDokuWikiModes(), gated on a new `\$dwPreferred` check that
evaluates true for 'dokuwiki' and 'dw+md'. In MD-preferred settings
('markdown' and 'md+dw') `__` will mean GFM strong, so loading
Underline there would conflict at the lexer level. Underline is
unchanged in the default 'dokuwiki' setting.

2. resolveModeClass() now PascalCases every `_`-separated segment of
the mode name, so `gfm_emphasis_underscore` resolves to
`GfmEmphasisUnderscore`. Existing lowercase-compound names like
`internallink` still resolve to `Internallink` (one segment,
ucfirst-ed) — no behaviour change for current modes. This prepares
the registry to load Gfm mode classes whose PascalCase filenames
preserve word boundaries for readability.

3. ModeRegistryTest's multiple near-identical per-mode gating tests
are consolidated into a single data-provider-driven
testModeLoadingBySyntax, fed by a `\$rules` table that lists each
mode against its four-setting expected load state. Adding a new
gated mode now means one line in the provider. Currently only
Underline is listed; upcoming Gfm-mode commits will add theirs.

show more ...

6b33ca9320-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add regex-primitive constants and getInstructionName() hook

Preparatory refactor for the upcoming GFM parser modes. No behaviour
change for any existing mode: CONTENT_UNTIL_PARA still evaluates to
t

add regex-primitive constants and getInstructionName() hook

Preparatory refactor for the upcoming GFM parser modes. No behaviour
change for any existing mode: CONTENT_UNTIL_PARA still evaluates to
the same regex (now factored through NOT_AT_PARA_BREAK), and
getInstructionName() defaults to getModeName() so all current
AbstractFormatting subclasses emit the same handler instructions as
before.

AbstractMode gains four new shared regex constants:

NOT_AT_PARA_BREAK — zero-width assertion: current position is not
the start of a paragraph break (blank line).
Extracted from CONTENT_UNTIL_PARA for reuse in
patterns that need a custom body char class.

NON_WORD_CHAR — char class: ASCII whitespace or ASCII punctuation
except `_`. Multibyte-safe by construction:
UTF-8 continuation bytes are >= 0x80 and thus
fall outside every ASCII class, so checking
positively that the surrounding context IS a
non-word char correctly treats multibyte
letters as word-like. No `u` flag required.

NO_WORD_BEFORE — zero-width: preceded by NON_WORD_CHAR or at
start-of-input/line. For intraword-aware
openers.

NO_WORD_AFTER — zero-width: followed by NON_WORD_CHAR or at
end-of-input. Complement of NO_WORD_BEFORE.

AbstractFormatting gains a getInstructionName() hook that defaults to
getModeName(). Subclasses that want to emit handler instructions under
a different name than their lexer mode name (so a Gfm mode can share
DW's `emphasis_open`/`strong_open` instructions while registering its
own lexer state) override this method.

show more ...

c375541020-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

require non-whitespace adjacency for inline formatting delimiters

An opening delimiter must now be followed by a non-whitespace character,
and a closing delimiter must be preceded by one. Empty deli

require non-whitespace adjacency for inline formatting delimiters

An opening delimiter must now be followed by a non-whitespace character,
and a closing delimiter must be preceded by one. Empty delimiter pairs
(****, ____, '''', <sub></sub>, <sup></sup>, <del></del>) no longer
match and stay literal.

Rationale: this matches Markdown's flanking-delimiter
rules and eliminates accidental bolding of sequences like `** note**`
at the start of a sentence. Well-formed uses (**bold**, //italic//,
__underline__) are unchanged.

Affected modes: Strong, Emphasis, Underline, Monospace, Subscript,
Superscript, Deleted.

BREAKING: content that was already malformed but
previously rendered as formatted (e.g. `**foo bar **`) now stays
literal.

show more ...

10fb3d6520-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

prevent inline formatting from matching across paragraph boundaries

The Lexer compiles all patterns with the `s` (DOTALL) flag via
ParallelRegex::getPerlMatchingFlags(), which makes `.` match newlin

prevent inline formatting from matching across paragraph boundaries

The Lexer compiles all patterns with the `s` (DOTALL) flag via
ParallelRegex::getPerlMatchingFlags(), which makes `.` match newlines.
Inline formatting modes use lookaheads like `\*\*(?=.*\*\*)` to verify
a closing delimiter exists, so with DOTALL a lone `**` happily matched
its "closer" many paragraphs later, swallowing blank lines into a
single <strong> run.

Add CONTENT_UNTIL_PARA on AbstractMode — a regex snippet matching any
character unless it would start a paragraph break (blank line, possibly
with horizontal whitespace). Update all inline formatting entry patterns
(Strong, Emphasis, Underline, Monospace, Subscript, Superscript,
Deleted) to use it in their closing-delimiter lookaheads.

Emphasis also gets a real closing-`//` check; its previous lookahead
just verified "content exists with a non-colon char" without requiring
the closing delimiter at all.

Single newlines inside a delimiter pair still match (multi-line
formatting); only blank lines end it.

BREAKING: This means you no longer can mark multiple paragraphs as bold
or strike them out. On the other hand it prevents accidentally breaking
the page layout by missing a closing delimiter (as reported many many
times over the years) eg. #1025 #3588 #1056

show more ...

17c6179b20-Apr-2026 Andreas Gohr <gohr@cosmocode.de>

add $conf['syntax'] setting and conditional mode loading in ModeRegistry

Introduce a new 'syntax' configuration setting (dokuwiki, markdown, dw+md, md+dw)
that controls which parser modes are loaded

add $conf['syntax'] setting and conditional mode loading in ModeRegistry

Introduce a new 'syntax' configuration setting (dokuwiki, markdown, dw+md, md+dw)
that controls which parser modes are loaded. Built-in modes are split into
always-loaded (no Markdown equivalent), DW-only, and MD-only groups.
Refactor getModes() into focused sub-methods for each group.

No Gfm mode classes exist yet, so only 'dokuwiki' is functional.
The change is a strict no-op for existing behavior.

show more ...

04045fea18-Apr-2026 Andreas Gohr <andi@splitbrain.org>

remove unused rewriteBlocks property from Handler

This flag was added in b7c441b9 (2005) for planned wiki syntax
converters but was never set to false anywhere. Remove the dead
conditional and alway

remove unused rewriteBlocks property from Handler

This flag was added in b7c441b9 (2005) for planned wiki syntax
converters but was never set to false anywhere. Remove the dead
conditional and always run Block processing.

show more ...

1234