| 8a34b0d8 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
remove comment about failing tests now that the work is complete |
| e7dae73b | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
fix: apply rector and code sniffer fixes |
| d331a839 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename
Constant was renamed on master (the typo'd 'substition' value is kept, but the constant name spells it correctly). Update GfmTabl
GFM modes: follow CATEGORY_SUBSTITION → CATEGORY_SUBSTITUTION rename
Constant was renamed on master (the typo'd 'substition' value is kept, but the constant name spells it correctly). Update GfmTable's use of the constant, plus stale docblock/comment references in GfmEscape, GfmHtmlEntity, GfmLinebreak, and GfmLinebreakTest.
show more ...
|
| 465aec67 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Parsing tests: split LinksTest into per-ParserMode files
Replaces the monolithic LinksTest with one test class per ParserMode: CamelcaselinkTest, EmaillinkTest, ExternallinkTest, FilelinkTest, Inter
Parsing tests: split LinksTest into per-ParserMode files
Replaces the monolithic LinksTest with one test class per ParserMode: CamelcaselinkTest, EmaillinkTest, ExternallinkTest, FilelinkTest, InternallinkTest, WindowssharelinkTest. Media-parser dispatch tests move into the existing MediaTest alongside the audio/video coverage.
ExternallinkTest also adds unit coverage for the GFM autolink extension trim step: balanced parens, trailing entity refs (valid named, numeric, and unknown round-trip), non-trailing entities staying inside the URL, and mixed paren/entity peeling.
show more ...
|
| aa346d4b | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: clear acronym table in spec renderer
The default conf/acronyms.conf entries (notably FTP) get wrapped in <abbr> by the XHTML renderer's acronym() call, which the spec output never has.
GfmSpecTest: clear acronym table in spec renderer
The default conf/acronyms.conf entries (notably FTP) get wrapped in <abbr> by the XHTML renderer's acronym() call, which the spec output never has. Clearing the renderer's acronym table makes acronym() fall through to literal text, mirroring how typography substitutions are already neutralized via SpecCompatRenderer. Brings example #628 to passing without touching production wiki rendering.
show more ...
|
| 1beb7450 | 12-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip deferred-feature spec cases
Heading-inline syntax (#36, #46) is deferred; the existing header instruction and downstream renderers would need rework to process inline modes inside
GfmSpecTest: skip deferred-feature spec cases
Heading-inline syntax (#36, #46) is deferred; the existing header instruction and downstream renderers would need rework to process inline modes inside heading content.
Bare URL autolinking without angle brackets (#619) is a deliberate DokuWiki feature in Externallink, not a feature we'll remove to match the strict CommonMark §6.8 rule.
The GFM bare-email autolink extension (#629-631) is out of scope - DokuWiki's Email mode only recognises emails inside angle brackets.
show more ...
|
| 73dc0a89 | 06-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
fix(mail): keep '&' intact in mailto links with multiple query params
Move the email-handling helpers (obfuscate, mail_isvalid, mail_quotedprintable_encode, mail_setup) out of the procedural inc/mai
fix(mail): keep '&' intact in mailto links with multiple query params
Move the email-handling helpers (obfuscate, mail_isvalid, mail_quotedprintable_encode, mail_setup) out of the procedural inc/mail.php into a namespaced dokuwiki\MailUtils class plus a new Mailer::configInit(), and add a separate MailUtils::obfuscateUrl() for the mailto-href context.
The xhtml renderer and PluginTrait now build the link label and the href separately: the address half is run through the mailguard obfuscation, the query string is preserved verbatim with only HTML escaping applied. This fixes #1690 — in 'visible' mode the previous code rawurlencoded the entire address+query, turning '?' into '%3F' and breaking multi-parameter mailto links; in all modes the query string is no longer mangled by the [at]/[dot] substitution.
Core call sites (Mailer, auth, LegacyApiCore, common, the xhtml renderer, the parser, the bundled config/styling/usermanager plugins) are migrated to MailUtils directly. The old top-level functions and PREG_PATTERN_VALID_EMAIL constant remain as deprecated shims with rector mappings.
Tests for obfuscate / mail_isvalid / mail_quotedprintable_encode are consolidated into a single _test/tests/MailUtilsTest.php and extended with regression coverage for the multi-parameter, double-escape and URL-shape cases.
Closes #1690 Replaces #1964
show more ...
|
| 0f694376 | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
GfmLink: accept escaped brackets inside link labels
The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and left labels with escaped brackets unmatched. Promote it to `(?:\\.|[^\[\]\n])+` —
GfmLink: accept escaped brackets inside link labels
The label slot used `[^\[\]\n]+`, which rejected `\[` / `\]` and left labels with escaped brackets unmatched. Promote it to `(?:\\.|[^\[\]\n])+` — the same backslash-escape trick the URL slot already uses — so spec example 523 (`[link \[bar](/uri)`) matches and unescapes cleanly. The image-as-label sub-pattern gets the same upgrade.
handle() needs no change: the new class still rejects bare `]`, so the first literal `](` in the match is still the separator; Escape::unescapeBackslashes() was already collapsing `\[` to `[` before the label reached the link handler.
Adds two GfmLinkTest cases for the `\[` / `\]` forms.
show more ...
|
| 113171bb | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Preformatted: strip leading/trailing blank-line padding from body
The lexer's `\n ` continuation pattern eats the indent off blank-but-indented source lines, leaving a `\n` in the rewriter buffer
Preformatted: strip leading/trailing blank-line padding from body
The lexer's `\n ` continuation pattern eats the indent off blank-but-indented source lines, leaving a `\n` in the rewriter buffer for each one. Trim those padding newlines before emitting so the preformatted body starts and ends on a non-blank line, as GFM example #87 requires. Whitespace-only blocks are still skipped entirely.
Adds two PreformattedTest cases pinning the new behavior.
show more ...
|
| 451f2842 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
clean up markdown spec test skips
Ordered by example number, same format, single line.
Some skips now actually pass - removed. A couple others should pass but don't yet - also removed. Code fix fol
clean up markdown spec test skips
Ordered by example number, same format, single line.
Some skips now actually pass - removed. A couple others should pass but don't yet - also removed. Code fix follows.
show more ...
|
| dccbd514 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmQuote: accept ^> line starts so quotes can follow tables and lists
GfmTable, DW Table, and DW Listblock all consume the boundary \n on their way out. A pure-lookahead exit pattern at that boundar
GfmQuote: accept ^> line starts so quotes can follow tables and lists
GfmTable, DW Table, and DW Listblock all consume the boundary \n on their way out. A pure-lookahead exit pattern at that boundary would trip the lexer's no-advance safety check, because tables and lists exit right after consuming a marker token and have no leading unmatched content for the lookahead to attach to (unlike Preformatted, whose body leaves code lines as UNMATCHED right before the boundary).
Fix this on the consumer side: change the first-line anchor from \n> to (?:^|\n)>. With the lexer's m flag, ^ matches at offset 0 and at any position immediately following a \n in the subject, including the position right after a \n that a preceding mode just consumed. Subsequent quote lines keep the \n> anchor.
Adds three handoff tests in GfmQuoteTest covering GfmTable, DW Table, and DW Listblock. Resolves GFM spec example 201.
show more ...
|
| 198d33e8 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip task list items extension (#279, #280)
GFM task list items (`- [ ] foo` / `- [x] foo`) are not implemented; the literal marker stays as the first content of the list item. |
| 506762f4 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: skip indented-code §4.4 family and Disallowed Raw HTML #652
The 4-space indent trigger fires on paragraph-continuation lines and exits on any blank line, both of which collide with Comm
GfmSpecTest: skip indented-code §4.4 family and Disallowed Raw HTML #652
The 4-space indent trigger fires on paragraph-continuation lines and exits on any blank line, both of which collide with CommonMark §4.4 — fixing either would require paragraph-open state the single-pass lexer cannot carry. List-interior cases additionally need the column arithmetic documented as out of scope for the §2.2 tabs family.
#652 (Disallowed Raw HTML) is a filter on top of raw HTML pass-through, which DokuWiki escapes by policy (see #118-160), so it has no input.
show more ...
|
| f9d3b7bd | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme
Externallink: add per-scheme angle-bracket autolinks for MD syntax
Adds CommonMark §6.5 <URL> autolinks to Externallink, gated to md/md+dw/dw+md syntax via ModeRegistry::isMdPreferred(). Per-scheme patterns share the existing conf/scheme.conf allow-list so unknown schemes fall through to literal cdata instead of being silently dropped by the renderer. Internal whitespace inside the brackets disqualifies the autolink and the whole envelope is emitted as cdata to keep the bare-URL detector off the URL.
LinksTest gains 5 cases covering success, internal-whitespace and leading-whitespace disqualification, unregistered scheme fallthrough, and the dw-only no-op path. SpecCompatRenderer URL encoder is updated to match cmark-gfm's HREF_SAFE table (square brackets and a few other characters move from safe to encoded). skip.php loses the obsolete #356 entry and gains #605/#606/#607/#609 explaining the unregistered- scheme cases that the per-scheme regex naturally rejects.
show more ...
|
| c2248fda | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
Block: collapse [ \t]*\n[ \t]* to \n inside paragraphs
Strip [ \t]+ on either side of the soft-break joiner emitted for a single eol, and ltrim the first cdata of each paragraph. Without this, DokuW
Block: collapse [ \t]*\n[ \t]* to \n inside paragraphs
Strip [ \t]+ on either side of the soft-break joiner emitted for a single eol, and ltrim the first cdata of each paragraph. Without this, DokuWiki preserved leading/trailing whitespace on continuation lines verbatim, which is invisible in HTML but may visible in plain-text and other renderers. It is also a requirement in the Markdown spec.
Re-baseline the parser-mode tests that pinned the old preserve behavior (cdata adjacent to <code>/<file>/<rss>/header/footnote).
show more ...
|
| d379b737 | 05-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
GfmSpecTest: neutralize DW typography for spec roundtrip
Force $conf[typography] = 0 in renderMarkdown() so the Quotes and MultiplyEntity modes are not loaded, override entity() in SpecCompatRendere
GfmSpecTest: neutralize DW typography for spec roundtrip
Force $conf[typography] = 0 in renderMarkdown() so the Quotes and MultiplyEntity modes are not loaded, override entity() in SpecCompatRenderer to emit the original match instead of the typographic glyph, and switch _xmlEntities() from ENT_QUOTES to ENT_COMPAT so `'` stays literal in body text while `"` is still escaped to ". Drops three skip entries (#308, #310, #353) that existed only to paper over the same divergence and unblocks #16, #25 and #670.
show more ...
|
| f57da51c | 05-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
Preformatted: leave boundary \n in stream when next line has content
Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing consuming \n exit. When an indented code block is followed
Preformatted: leave boundary \n in stream when next line has content
Adds a zero-width lookahead exit (?=\n[^ \t\n]) ahead of the existing consuming \n exit. When an indented code block is followed by a non- blank line, the boundary newline now stays available for downstream block-level matchers (GfmHr, GfmHeader, etc.) instead of being eaten on the way out of preformatted mode.
Concretely fixes a thematic-break-after-indented-code case (GFM spec case 85's trailing ----): without this change, GfmHr's \n anchor failed because preformatted had already consumed the newline, and the bare ---- fell through to Entity which converted --- to an em-dash.
The consuming branch is kept as a fall-through for the blank-line and end-of-input cases, where a pure lookahead would trip the lexer's no-advance safety check.
Six PreformattedTest expectations updated: trailing cdata after a preformatted block now carries the leading \n (rendered output is unchanged — paragraph whitespace is trimmed).
show more ...
|
| ab6ac090 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add tests pinning blank-line tolerance in GFM listblock |
| b37c6ef7 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
more test skips |
| 6359e7fd | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
percent-encode URLs in SpecCompatRenderer to match spec output
CommonMark's reference renderer percent-encodes URL bytes outside the RFC 3986 unreserved/reserved set (and existing %XX sequences pass
percent-encode URLs in SpecCompatRenderer to match spec output
CommonMark's reference renderer percent-encodes URL bytes outside the RFC 3986 unreserved/reserved set (and existing %XX sequences pass through unchanged). DokuWiki's XHTML renderer leaves UTF-8 and backslashes literal in href, which is fine for live wiki output but diverges byte-for-byte from spec.
Adds specEncodeUrl() to the spec-compat renderer and applies it in specLink(). Same shape as the earlier `→`->`\t` substitution: a test-harness alignment with spec convention, no production behavior change.
Unskips #510 (backslash in URL) and #511 (entity / percent-encoding in URL); both now match spec output with the parser-side decoding from the previous commit and the renderer-side encoding here.
show more ...
|
| eb15e634 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same
extract Helpers\HtmlEntity, wire into GfmCode and GfmLink URL slot
Numeric and named HTML entity decoding moves out of GfmHtmlEntity into a pure helper, so capture-by-regex modes can apply the same decode post-extraction (the inline lexer never reaches their bodies). Mirrors the Helpers\Escape pattern.
Wired up in two slots:
- GfmCode info string: föö now decodes to föö in the language class. Clears spec example #330.
- GfmLink URL: GfmLink::extractUrl() decodes entities. URL pattern extends from `[^)\n]+` to `(?:\\.|[^)\n])+` so an escaped \) no longer terminates the URL early; the existing post-classify Escape::unescapeBackslashes call strips the backslashes after Link::classify has done its work. Clears #504, #506, #508.
Skip #328 with a self-contained title-slot reason: the URL side now decodes correctly, but the title attribute is still discarded (DokuWiki link instructions have no title slot).
show more ...
|
| d2085866 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity
Numeric refs are still decoded explicitly: PHP's html_entity_decode returns the input unchanged for U+0000, surrogates, U+10F
extend GfmNumericEntity to HTML5 named entities, rename to GfmHtmlEntity
Numeric refs are still decoded explicitly: PHP's html_entity_decode returns the input unchanged for U+0000, surrogates, U+10FFFF, and BMP noncharacters where CommonMark requires U+FFFD or the literal codepoint. Named refs delegate to html_entity_decode with ENT_HTML5, which carries the full HTML5 named-entity table (including multi- codepoint decodes like ≧̸ -> U+2267 + U+0338).
Unknown names stay literal: the original &xxx; passes through as cdata and the renderer's &-escaping turns it into &xxx;.
show more ...
|
| 09f34c31 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
apply spec convention: → represents a tab in GfmSpecTest
CommonMark spec.txt uses U+2192 RIGHTWARDS ARROW to visually mark literal tab characters in examples (see spec.txt, "About this document"). S
apply spec convention: → represents a tab in GfmSpecTest
CommonMark spec.txt uses U+2192 RIGHTWARDS ARROW to visually mark literal tab characters in examples (see spec.txt, "About this document"). Substitute → for \t in both markdown input and expected HTML so the corpus exercises real tab handling.
Surfaced by GfmNumericEntity: example #336 (	foo) now decodes the entity to a tab and produces correct output, but the harness was comparing against literal → in the expected HTML.
show more ...
|
| 150dc5f2 | 04-May-2026 |
Andreas Gohr <andi@splitbrain.org> |
add GfmNumericEntity for CommonMark numeric character references
Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex, 1-6 digits) to the corresponding Unicode codepoint, emitted as plain
add GfmNumericEntity for CommonMark numeric character references
Decodes &#nnn; (decimal, 1-7 digits) and &#xhhh; / &#Xhhh; (hex, 1-6 digits) to the corresponding Unicode codepoint, emitted as plain cdata. Codepoint 0, codepoints above U+10FFFF, and the surrogate range U+D800..U+DFFF map to U+FFFD per the spec.
Distinct from the typography Entity mode, which is renderer-side configurable via entities.conf. Numeric refs are not configurable so decoding happens at parse time and the renderer needs no changes.
Lexer leftmost-match consumes the run before any structural pattern, so *foo* renders as literal *foo* and * foo does not start a list - matching the spec rule that numeric refs cannot stand in for structural markers.
show more ...
|
| b414dba2 | 04-May-2026 |
Andreas Gohr <gohr@cosmocode.de> |
skip a few more spec tests
Those are all deliberately not supported cases |