History log of /dokuwiki/inc/Parsing/ (Results 1 – 25 of 87)
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
a7e1021625-Jun-2026 splitbrain <86426+splitbrain@users.noreply.github.com>

�� Rector and PHPCS fixes

4b31eadf04-Jun-2026 Andreas Gohr <gohr@cosmocode.de>

fix (parsing): avoid newline loss on GFM section editing

The GFM header parsing returned a byte position pointing at the newline
before the actual header resulting in the observed newline eatings as

fix (parsing): avoid newline loss on GFM section editing

The GFM header parsing returned a byte position pointing at the newline
before the actual header resulting in the observed newline eatings as
reported in https://github.com/dokuwiki/dokuwiki/pull/4636#issuecomment-4491970909

Additionally this fixes an oddity of DW header parsing which
accidentally allowed text on the line before the opening = chars.
Whitespace is still allowed.

show more ...

47a02a1004-Jun-2026 Andreas Gohr <gohr@cosmocode.de>

Parsing: make parse syntax a per-parse value, drop ModeInterface

The active parse's syntax flavour is a per-parse question, not process-
global state: within a single request a plugin can render bun

Parsing: make parse syntax a per-parse value, drop ModeInterface

The active parse's syntax flavour is a per-parse question, not process-
global state: within a single request a plugin can render bundled
DokuWiki-syntax text inside an otherwise-Markdown page. Yet ModeRegistry
was a singleton that read $conf['syntax'] and the $PARSER_MODES global,
and every mode reached it through ModeRegistry::getInstance() — so the
flavour lived in shared mutable state that two parses in one request
would fight over.

Make the registry a short-lived value instead:

- ModeRegistry is constructed once per parse with an explicit $syntax
and injected into Parser, Handler and every mode. getSyntax() /
isDwPreferred() / isMdPreferred() consult $this->syntax; the
DOKU_UNITTEST-gated mode-list cache hack is gone (each registry is
fresh, nothing to invalidate).
- p_get_instructions() is now the single place in the pipeline where
$conf['syntax'] is read; from there the flavour travels as a
parameter. No code under inc/Parsing/ reads $conf['syntax'] directly
anymore — the five syntax-reading modes (Preformatted, GfmHr,
GfmEscape, Externallink, GfmQuote) route through $this->registry.

Keep the two concepts apart, as documented in the ModeRegistry and
AbstractMode docblocks: the user's configured *preference* stays in
$conf['syntax'] for UI code (toolbar, settings), while the active
parse's syntax is a parameter carried by the registry.

$PARSER_MODES is demoted to a deprecated, read-only mirror, published
during loadPluginModes() — third-party syntax plugins (columnlist,
alphalist2, phpwikify, skipentity) and the bundled info plugin read the
global directly, often from their constructors, so the taxonomy must
stay visible there. No core code reads the mirror.

Fold ModeInterface into AbstractMode while here: getSort()/handle() are
abstract, the connect callbacks carry defaults, and the public $Lexer
"FIXME should be done by setter" becomes setLexer()/getLexer() injected
by Parser::addMode() alongside the registry. Nested-content resolution
moves to the allowedCategories()/filterAllowedModes() hooks, resolved
once when the registry is attached.

Tests build their own parser/registry through ParserTestBase::setSyntax()
instead of mutating $conf and calling the removed ModeRegistry::reset().

show more ...


/dokuwiki/_test/bootstrap.php
/dokuwiki/_test/tests/Parsing/HandlerTest.php
/dokuwiki/_test/tests/Parsing/Lexer/RecordingHandler.php
/dokuwiki/_test/tests/Parsing/Markdown/GfmSpecTest.php
/dokuwiki/_test/tests/Parsing/ModeRegistryTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/ExternallinkTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmBacktickDoubleTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmBacktickSingleTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmCodeTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmDeletedTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisStrongTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisStrongUnderscoreTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEmphasisUnderscoreTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmEscapeTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmFileTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmHeaderTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmHrTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmLinebreakTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmLinkTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmListblockTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmMediaTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmQuoteTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/GfmStrongUnderscoreTest.php
/dokuwiki/_test/tests/Parsing/ParserMode/ParserTestBase.php
/dokuwiki/_test/tests/Parsing/ParserMode/PreformattedTest.php
/dokuwiki/inc/Extension/SyntaxPlugin.php
Handler.php
ModeRegistry.php
Parser.php
ParserMode/AbstractFormatting.php
ParserMode/AbstractMode.php
ParserMode/Base.php
ParserMode/Eol.php
ParserMode/Externallink.php
ParserMode/Footnote.php
ParserMode/GfmBacktickSingle.php
ParserMode/GfmCode.php
ParserMode/GfmEscape.php
ParserMode/GfmHr.php
ParserMode/GfmHtmlEntity.php
ParserMode/GfmListblock.php
ParserMode/GfmQuote.php
ParserMode/GfmTable.php
ParserMode/Listblock.php
ParserMode/Preformatted.php
ParserMode/Table.php
/dokuwiki/inc/deprecated.php
/dokuwiki/inc/parserutils.php
4f32c45b26-May-2026 Andreas Gohr <gohr@cosmocode.de>

GfmLink: allow soft line break inside link text

The label character class explicitly forbade `\n`, so a CommonMark
soft line break inside link text (e.g. `[link with<EOL>more](url)`)
fell through to

GfmLink: allow soft line break inside link text

The label character class explicitly forbade `\n`, so a CommonMark
soft line break inside link text (e.g. `[link with<EOL>more](url)`)
fell through to literal text instead of producing a link. Loosen the
class to accept a bare `\n` as long as it is not followed by a blank
line — soft breaks are spec-allowed inside link text, blank lines are
not, and refusing them also keeps `\n#`-anchored block modes (header,
hr, ...) from being swallowed by a runaway link match.

The `\n` survives into the label string and renders as a literal line
ending in HTML, which browsers display as a single space. This soft
break behavior has been checked against
https://spec.commonmark.org/dingus/

Note that this behavior differs from github where the line break is
rendered as a hard break <br>.

show more ...

65dd204226-May-2026 Andreas Gohr <gohr@cosmocode.de>

GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes

Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the
two backslashes of `\\` followed by space/tab/newline. The lexer's
ti

GfmEscape: defer \\<EOL> to DW Linebreak in mixed-syntax modes

Both GfmEscape (sort 5) and DW Linebreak (sort 140) can claim the
two backslashes of `\\` followed by space/tab/newline. The lexer's
tie-breaker picked GfmEscape, so DW's forced linebreak silently lost
its delimiter under dw+md and md+dw. Add a negative lookahead that
declines `\\[ \t\n]` whenever DW syntax is loaded — pure md keeps
GFM-spec behavior. Mid-line `\\` (UNC paths etc.) still escapes.

show more ...

7686f20312-May-2026 Anna Dabrowska <dabrowska@cosmocode.de>

Fix syntax plugin rendering

Reverse the order in which core modes and plugin modes are handled by the \dokuwiki\Parsing\Handler.
Otherwise only the handle() method of the plugin is called, which is

Fix syntax plugin rendering

Reverse the order in which core modes and plugin modes are handled by the \dokuwiki\Parsing\Handler.
Otherwise only the handle() method of the plugin is called, which is fine for core modes. Syntax plugins need to go through plugin() to actually add their own calls.

show more ...

e7dae73b12-May-2026 Andreas Gohr <andi@splitbrain.org>

fix: apply rector and code sniffer fixes

d331a83912-May-2026 Andreas Gohr <andi@splitbrain.org>

GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename

Constant was renamed on master (the typo'd 'substition' value is kept,
but the constant name spells it correctly). Update GfmTabl

GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename

Constant was renamed on master (the typo'd 'substition' value is kept,
but the constant name spells it correctly). Update GfmTable's use of
the constant, plus stale docblock/comment references in GfmEscape,
GfmHtmlEntity, GfmLinebreak, and GfmLinebreakTest.

show more ...

15429f0212-May-2026 Andreas Gohr <andi@splitbrain.org>

Externallink: GFM autolink extension - parens and entity-ref tail

In Markdown-preferred mode, allow `(` and `)` inside URL char classes
and consume an optional trailing entity reference via the shar

Externallink: GFM autolink extension - parens and entity-ref tail

In Markdown-preferred mode, allow `(` and `)` inside URL char classes
and consume an optional trailing entity reference via the shared
HtmlEntity::PATTERN. The Markdown-only post-processing peels off
mismatched closing parens and decodes the trailing entity reference,
emitting the peeled chars as cdata after the link. Refactors handle()
to dispatch to handleAngleAutolink() and handleBareUrl(), with the new
trim logic in peelGfmTail() and the protocol-prefix step in
addProtocolPrefix(). DW-only mode behavior is unchanged.

Brings GFM spec examples #624, #625, #626 to passing.

show more ...

73dc0a8906-May-2026 Andreas Gohr <andi@splitbrain.org>

fix(mail): keep '&' intact in mailto links with multiple query params

Move the email-handling helpers (obfuscate, mail_isvalid,
mail_quotedprintable_encode, mail_setup) out of the procedural
inc/mai

fix(mail): keep '&' intact in mailto links with multiple query params

Move the email-handling helpers (obfuscate, mail_isvalid,
mail_quotedprintable_encode, mail_setup) out of the procedural
inc/mail.php into a namespaced dokuwiki\MailUtils class plus a new
Mailer::configInit(), and add a separate MailUtils::obfuscateUrl() for
the mailto-href context.

The xhtml renderer and PluginTrait now build the link label and the
href separately: the address half is run through the mailguard
obfuscation, the query string is preserved verbatim with only HTML
escaping applied. This fixes #1690 — in 'visible' mode the previous
code rawurlencoded the entire address+query, turning '?' into '%3F' and
breaking multi-parameter mailto links; in all modes the query string is
no longer mangled by the [at]/[dot] substitution.

Core call sites (Mailer, auth, LegacyApiCore, common, the xhtml
renderer, the parser, the bundled config/styling/usermanager plugins)
are migrated to MailUtils directly. The old top-level functions and
PREG_PATTERN_VALID_EMAIL constant remain as deprecated shims with
rector mappings.

Tests for obfuscate / mail_isvalid / mail_quotedprintable_encode are
consolidated into a single _test/tests/MailUtilsTest.php and extended
with regression coverage for the multi-parameter, double-escape and
URL-shape cases.

Closes #1690
Replaces #1964

show more ...

8788dbbd06-May-2026 splitbrain <86426+splitbrain@users.noreply.github.com>

�� Rector and PHPCS fixes


/dokuwiki/_test/core/DokuWikiTest.php
/dokuwiki/_test/data/media/wiki/exif-orient-6.jpg
/dokuwiki/_test/tests/Feed/FeedCreatorOptionsTest.php
/dokuwiki/_test/tests/File/MediaFileTest.php
/dokuwiki/_test/tests/Remote/ApiCoreTest.php
/dokuwiki/_test/tests/Search/BacklinksTest.php
/dokuwiki/_test/tests/Search/Collection/CollectionSearchTest.php
/dokuwiki/_test/tests/Search/Collection/DirectCollectionTest.php
/dokuwiki/_test/tests/Search/Collection/FrequencyCollectionTest.php
/dokuwiki/_test/tests/Search/Collection/LookupCollectionTest.php
/dokuwiki/_test/tests/Search/Collection/MockDirectCollection.php
/dokuwiki/_test/tests/Search/Collection/MockFrequencyCollection.php
/dokuwiki/_test/tests/Search/Collection/MockLookupCollection.php
/dokuwiki/_test/tests/Search/Collection/TermTest.php
/dokuwiki/_test/tests/Search/Index/AbstractIndexTestCase.php
/dokuwiki/_test/tests/Search/Index/FileIndexTest.php
/dokuwiki/_test/tests/Search/Index/LockTest.php
/dokuwiki/_test/tests/Search/Index/MemoryIndexTest.php
/dokuwiki/_test/tests/Search/Index/TupleOpsTest.php
/dokuwiki/_test/tests/Search/IndexerTest.php
/dokuwiki/_test/tests/Search/IntegrityTest.php
/dokuwiki/_test/tests/Search/MediauseTest.php
/dokuwiki/_test/tests/Search/MetadataSearchTest.php
/dokuwiki/_test/tests/Search/Query/NamespacePredicateTest.php
/dokuwiki/_test/tests/Search/Query/PageSetTest.php
/dokuwiki/_test/tests/Search/Query/QueryEvaluatorTest.php
/dokuwiki/_test/tests/Search/Query/QueryParserTest.php
/dokuwiki/_test/tests/Search/data/searchtest.txt
/dokuwiki/_test/tests/Ui/Media/DisplayTest.php
/dokuwiki/_test/tests/inc/IpTest.php
/dokuwiki/_test/tests/inc/changelog_getrelativerevision.test.php
/dokuwiki/_test/tests/inc/common_clientip.test.php
/dokuwiki/_test/tests/inc/common_saveWikiText.test.php
/dokuwiki/_test/tests/lib/exe/fetch_imagetoken.test.php
/dokuwiki/bin/indexer.php
/dokuwiki/composer.json
/dokuwiki/composer.lock
/dokuwiki/conf/dokuwiki.php
/dokuwiki/feed.php
/dokuwiki/inc/Action/Preview.php
/dokuwiki/inc/Action/Search.php
/dokuwiki/inc/Ajax.php
/dokuwiki/inc/ChangeLog/ChangeLog.php
/dokuwiki/inc/ChangeLog/MediaChangeLog.php
/dokuwiki/inc/ChangeLog/PageChangeLog.php
/dokuwiki/inc/Feed/FeedCreator.php
/dokuwiki/inc/Feed/FeedCreatorOptions.php
/dokuwiki/inc/File/MediaFile.php
/dokuwiki/inc/File/PageFile.php
/dokuwiki/inc/Ip.php
Handler.php
/dokuwiki/inc/Remote/ApiCore.php
/dokuwiki/inc/Search/Collection/AbstractCollection.php
/dokuwiki/inc/Search/Collection/CollectionSearch.php
/dokuwiki/inc/Search/Collection/DirectCollection.php
/dokuwiki/inc/Search/Collection/FrequencyCollection.php
/dokuwiki/inc/Search/Collection/LookupCollection.php
/dokuwiki/inc/Search/Collection/PageFulltextCollection.php
/dokuwiki/inc/Search/Collection/PageMetaCollection.php
/dokuwiki/inc/Search/Collection/PageTitleCollection.php
/dokuwiki/inc/Search/Collection/Term.php
/dokuwiki/inc/Search/Exception/IndexAccessException.php
/dokuwiki/inc/Search/Exception/IndexIntegrityException.php
/dokuwiki/inc/Search/Exception/IndexLockException.php
/dokuwiki/inc/Search/Exception/IndexUsageException.php
/dokuwiki/inc/Search/Exception/IndexWriteException.php
/dokuwiki/inc/Search/Exception/SearchException.php
/dokuwiki/inc/Search/FulltextSearch.php
/dokuwiki/inc/Search/Index/AbstractIndex.php
/dokuwiki/inc/Search/Index/FileIndex.php
/dokuwiki/inc/Search/Index/Lock.php
/dokuwiki/inc/Search/Index/MemoryIndex.php
/dokuwiki/inc/Search/Index/TupleOps.php
/dokuwiki/inc/Search/Indexer.php
/dokuwiki/inc/Search/MetadataSearch.php
/dokuwiki/inc/Search/Query/NamespacePredicate.php
/dokuwiki/inc/Search/Query/NegatedEntry.php
/dokuwiki/inc/Search/Query/PageSet.php
/dokuwiki/inc/Search/Query/QueryEvaluator.php
/dokuwiki/inc/Search/Query/QueryParser.php
/dokuwiki/inc/Search/Query/StackEntry.php
/dokuwiki/inc/Search/Tokenizer.php
/dokuwiki/inc/Search/concept.txt
/dokuwiki/inc/Sitemap/Mapper.php
/dokuwiki/inc/Subscriptions/BulkSubscriptionSender.php
/dokuwiki/inc/TaskRunner.php
/dokuwiki/inc/Ui/Backlinks.php
/dokuwiki/inc/Ui/Media/Display.php
/dokuwiki/inc/Ui/MediaDiff.php
/dokuwiki/inc/Ui/Search.php
/dokuwiki/inc/Ui/SearchState.php
/dokuwiki/inc/common.php
/dokuwiki/inc/deprecated.php
/dokuwiki/inc/html.php
/dokuwiki/inc/infoutils.php
/dokuwiki/inc/lang/fr/lang.php
/dokuwiki/inc/lang/pl/lang.php
/dokuwiki/inc/lang/tr/lang.php
/dokuwiki/inc/lang/tr/onceexisted.txt
/dokuwiki/inc/load.php
/dokuwiki/inc/media.php
/dokuwiki/inc/search.php
/dokuwiki/inc/template.php
/dokuwiki/lib/exe/fetch.php
/dokuwiki/lib/plugins/authldap/lang/tr/settings.php
/dokuwiki/lib/plugins/authpdo/lang/tr/lang.php
/dokuwiki/lib/plugins/authplain/lang/tr/lang.php
/dokuwiki/lib/plugins/config/lang/cs/lang.php
/dokuwiki/lib/plugins/config/lang/de/lang.php
/dokuwiki/lib/plugins/config/lang/en/lang.php
/dokuwiki/lib/plugins/config/lang/es/lang.php
/dokuwiki/lib/plugins/config/lang/fr/lang.php
/dokuwiki/lib/plugins/config/lang/hu/lang.php
/dokuwiki/lib/plugins/config/lang/it/lang.php
/dokuwiki/lib/plugins/config/lang/pl/lang.php
/dokuwiki/lib/plugins/config/lang/pt/lang.php
/dokuwiki/lib/plugins/config/lang/tr/lang.php
/dokuwiki/lib/plugins/config/settings/config.metadata.php
/dokuwiki/lib/plugins/extension/lang/tr/lang.php
/dokuwiki/lib/plugins/info/syntax.php
/dokuwiki/lib/scripts/page.js
/dokuwiki/lib/scripts/toolbar.js
/dokuwiki/lib/tpl/dokuwiki/detail.php
/dokuwiki/vendor/composer/installed.json
/dokuwiki/vendor/composer/installed.php
/dokuwiki/vendor/phpseclib/phpseclib/phpseclib/Crypt/Common/Formats/Keys/OpenSSH.php
/dokuwiki/vendor/phpseclib/phpseclib/phpseclib/Crypt/RSA.php
/dokuwiki/vendor/phpseclib/phpseclib/phpseclib/File/ASN1.php
/dokuwiki/vendor/splitbrain/slika/README.md
/dokuwiki/vendor/splitbrain/slika/src/GdAdapter.php
/dokuwiki/vendor/splitbrain/slika/src/ImageInfo.php
56c730b506-May-2026 Andreas Gohr <andi@splitbrain.org>

keep historic typo in value but not in constant

We need to keep the historic typo in the value ("substition"), but there
is no reason to keep it in the constant.

0f69437605-May-2026 Andreas Gohr <gohr@cosmocode.de>

GfmLink: accept escaped brackets inside link labels

The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and
left labels with escaped brackets unmatched. Promote it to
`(?:\\.|[^\[\]\n])+` —

GfmLink: accept escaped brackets inside link labels

The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and
left labels with escaped brackets unmatched. Promote it to
`(?:\\.|[^\[\]\n])+` — the same backslash-escape trick the URL
slot already uses — so spec example 523 (`[link \[bar](/uri)`)
matches and unescapes cleanly. The image-as-label sub-pattern
gets the same upgrade.

handle() needs no change: the new class still rejects bare `]`,
so the first literal `](` in the match is still the separator;
Escape::unescapeBackslashes() was already collapsing `\[` to `[`
before the label reached the link handler.

Adds two GfmLinkTest cases for the `\[` / `\]` forms.

show more ...

113171bb05-May-2026 Andreas Gohr <gohr@cosmocode.de>

Preformatted: strip leading/trailing blank-line padding from body

The lexer's `\n ` continuation pattern eats the indent off
blank-but-indented source lines, leaving a `\n` in the rewriter
buffer

Preformatted: strip leading/trailing blank-line padding from body

The lexer's `\n ` continuation pattern eats the indent off
blank-but-indented source lines, leaving a `\n` in the rewriter
buffer for each one. Trim those padding newlines before emitting
so the preformatted body starts and ends on a non-blank line, as
GFM example #87 requires. Whitespace-only blocks are still
skipped entirely.

Adds two PreformattedTest cases pinning the new behavior.

show more ...

dccbd51405-May-2026 Andreas Gohr <andi@splitbrain.org>

GfmQuote: accept ^> line starts so quotes can follow tables and lists

GfmTable, DW Table, and DW Listblock all consume the boundary \n on
their way out. A pure-lookahead exit pattern at that boundar

GfmQuote: accept ^> line starts so quotes can follow tables and lists

GfmTable, DW Table, and DW Listblock all consume the boundary \n on
their way out. A pure-lookahead exit pattern at that boundary would
trip the lexer's no-advance safety check, because tables and lists
exit right after consuming a marker token and have no leading
unmatched content for the lookahead to attach to (unlike Preformatted,
whose body leaves code lines as UNMATCHED right before the boundary).

Fix this on the consumer side: change the first-line anchor from \n>
to (?:^|\n)>. With the lexer's m flag, ^ matches at offset 0 and at
any position immediately following a \n in the subject, including the
position right after a \n that a preceding mode just consumed.
Subsequent quote lines keep the \n> anchor.

Adds three handoff tests in GfmQuoteTest covering GfmTable, DW Table,
and DW Listblock. Resolves GFM spec example 201.

show more ...

f9d3b7bd05-May-2026 Andreas Gohr <andi@splitbrain.org>

Externallink: add per-scheme angle-bracket autolinks for MD syntax

Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to
md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme

Externallink: add per-scheme angle-bracket autolinks for MD syntax

Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to
md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
patterns share the existing conf/scheme.conf allow-list so unknown
schemes fall through to literal cdata instead of being silently
dropped by the renderer. Internal whitespace inside the brackets
disqualifies the autolink and the whole envelope is emitted as
cdata to keep the bare-URL detector off the URL.

LinksTest gains 5 cases covering success, internal-whitespace and
leading-whitespace disqualification, unregistered scheme fallthrough,
and the dw-only no-op path. SpecCompatRenderer URL encoder is updated
to match cmark-gfm's HREF_SAFE table (square brackets and a few other
characters move from safe to encoded). skip.php loses the obsolete
#356 entry and gains #605/#606/#607/#609 explaining the unregistered-
scheme cases that the per-scheme regex naturally rejects.

show more ...

c2248fda05-May-2026 Andreas Gohr <andi@splitbrain.org>

Block: collapse [ \t]*\n[ \t]* to \n inside paragraphs

Strip [ \t]+ on either side of the soft-break joiner emitted for a
single eol, and ltrim the first cdata of each paragraph. Without this,
DokuW

Block: collapse [ \t]*\n[ \t]* to \n inside paragraphs

Strip [ \t]+ on either side of the soft-break joiner emitted for a
single eol, and ltrim the first cdata of each paragraph. Without this,
DokuWiki preserved leading/trailing whitespace on continuation lines
verbatim, which is invisible in HTML but may visible in plain-text
and other renderers. It is also a requirement in the Markdown spec.

Re-baseline the parser-mode tests that pinned the old preserve
behavior (cdata adjacent to <code>/<file>/<rss>/header/footnote).

show more ...

f57da51c05-May-2026 Andreas Gohr <gohr@cosmocode.de>

Preformatted: leave boundary \n in stream when next line has content

Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing
consuming \n exit. When an indented code block is followed

Preformatted: leave boundary \n in stream when next line has content

Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing
consuming \n exit. When an indented code block is followed by a non-
blank line, the boundary newline now stays available for downstream
block-level matchers (GfmHr, GfmHeader, etc.) instead of being eaten
on the way out of preformatted mode.

Concretely fixes a thematic-break-after-indented-code case (GFM spec
case 85's trailing ----): without this change, GfmHr's \n anchor failed
because preformatted had already consumed the newline, and the bare
---- fell through to Entity which converted --- to an em-dash.

The consuming branch is kept as a fall-through for the blank-line and
end-of-input cases, where a pure lookahead would trip the lexer's
no-advance safety check.

Six PreformattedTest expectations updated: trailing cdata after a
preformatted block now carries the leading \n (rendered output is
unchanged — paragraph whitespace is trimmed).

show more ...

95f6942005-May-2026 Andreas Gohr <gohr@cosmocode.de>

Lexer: dispatch zero-width EXIT events to mode handlers

invokeHandler short-circuited on empty content for every lexer state,
which silently dropped EXIT events from zero-width exit patterns
(lookah

Lexer: dispatch zero-width EXIT events to mode handlers

invokeHandler short-circuited on empty content for every lexer state,
which silently dropped EXIT events from zero-width exit patterns
(lookahead-only). The mode stack would still pop, but the mode's
handle() method never ran, so handler-side cleanup (restoring buffered
call writers, emitting close calls) was skipped. Now empty content is
only suppressed for non-EXIT states.

Also fix the parameter docblock: $is_match was annotated as boolean
but is actually one of the integer DOKU_LEXER_* constants. Renamed to
$state to match.

show more ...

eb15e63404-May-2026 Andreas Gohr <andi@splitbrain.org>

extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot

Numeric and named HTML entity decoding moves out of GfmHtmlEntity into
a pure helper, so capture-by-regex modes can apply the same

extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot

Numeric and named HTML entity decoding moves out of GfmHtmlEntity into
a pure helper, so capture-by-regex modes can apply the same decode
post-extraction (the inline lexer never reaches their bodies). Mirrors
the Helpers\Escape pattern.

Wired up in two slots:

- GfmCode info string: f&ouml;&ouml; now decodes to föö in the
language class. Clears spec example #330.

- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern
extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no
longer terminates the URL early; the existing post-classify
Escape::unescapeBackslashes call strips the backslashes after
Link::classify has done its work. Clears #504, #506, #508.

Skip #328 with a self-contained title-slot reason: the URL side now
decodes correctly, but the title attribute is still discarded
(DokuWiki link instructions have no title slot).

show more ...

d208586604-May-2026 Andreas Gohr <andi@splitbrain.org>

extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity

Numeric refs are still decoded explicitly: PHP's html_entity_decode
returns the input unchanged for U+0000, surrogates, U+10F

extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity

Numeric refs are still decoded explicitly: PHP's html_entity_decode
returns the input unchanged for U+0000, surrogates, U+10FFFF, and
BMP noncharacters where CommonMark requires U+FFFD or the literal
codepoint. Named refs delegate to html_entity_decode with ENT_HTML5,
which carries the full HTML5 named-entity table (including multi-
codepoint decodes like &ngE; -> U+2267 + U+0338).

Unknown names stay literal: the original &xxx; passes through as
cdata and the renderer's &-escaping turns it into &amp;xxx;.

show more ...

150dc5f204-May-2026 Andreas Gohr <andi@splitbrain.org>

add GfmNumericEntity for CommonMark numeric character references

Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex,
1-6 digits) to the corresponding Unicode codepoint, emitted as
plain

add GfmNumericEntity for CommonMark numeric character references

Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex,
1-6 digits) to the corresponding Unicode codepoint, emitted as
plain cdata. Codepoint 0, codepoints above U+10FFFF, and the
surrogate range U+D800..U+DFFF map to U+FFFD per the spec.

Distinct from the typography Entity mode, which is renderer-side
configurable via entities.conf. Numeric refs are not configurable
so decoding happens at parse time and the renderer needs no
changes.

Lexer leftmost-match consumes the run before any structural
pattern, so &#42;foo&#42; renders as literal *foo* and &#42; foo
does not start a list - matching the spec rule that numeric refs
cannot stand in for structural markers.

show more ...

13a62f8104-May-2026 Andreas Gohr <andi@splitbrain.org>

rename syntax flavors 'dokuwiki' / 'markdown' to 'dw' / 'md'

Symmetry with the existing 'dw+md' / 'md+dw' setting values.

c4bcbc2e04-May-2026 Andreas Gohr <andi@splitbrain.org>

add GfmLinebreak for GFM hard line breaks

Two-or-more trailing spaces, or a single backslash, immediately before
a non-final newline render as a `<br/>`. Both delimiter forms share a
single SUBSTITI

add GfmLinebreak for GFM hard line breaks

Two-or-more trailing spaces, or a single backslash, immediately before
a non-final newline render as a `<br/>`. Both delimiter forms share a
single SUBSTITION mode at sort 140, loaded under any MD-active syntax
(markdown, dw+md, md+dw); pure dokuwiki is unaffected.

Reuses the existing `linebreak` handler call and renderer; no new
instructions or renderer changes. SpecCompatRenderer overrides
linebreak() to emit the spec's `<br />` shape. Examples 662, 663
(line break inside a raw HTML tag) are skipped — raw HTML is not
passed through by default.

show more ...

3e6baeff30-Apr-2026 Andreas Gohr <andi@splitbrain.org>

replace DW Hr with unified GfmHr

Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_)
horizontal rules; pattern self-narrows on $conf['syntax']. Always
loaded across all four syntax se

replace DW Hr with unified GfmHr

Single mode covers both DokuWiki (4+ dashes) and GFM (3+ of -/*/_)
horizontal rules; pattern self-narrows on $conf['syntax']. Always
loaded across all four syntax settings, mirroring the GfmQuote
replacement pattern. Same `hr` handler call so renderers and the
call API are unchanged.

Drops DW's old [ \t]* leading-whitespace tolerance — inert in
practice past 0-1 spaces (Preformatted at sort 20 intercepts
everything ≥ 2 spaces or any tab).

Spec examples 13, 20, 26-28, 224 turn green; 17, 21-24, 29, 30, 31
go to skip.php as deliberate non-implementations (whitespace
tolerance and list-precedence cases).

show more ...

1234