| 864d6c6d | 21-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
fix Lexer so exit-pattern lookbehinds see chars consumed by prior tokens
Lexer::reduce used to hand PCRE a shrinking tail of the subject — each matched token was chopped off the front of $raw and th
fix Lexer so exit-pattern lookbehinds see chars consumed by prior tokens
Lexer::reduce used to hand PCRE a shrinking tail of the subject — each matched token was chopped off the front of $raw and the next preg_match ran on what remained. Once a token was consumed, the bytes before the cursor were gone, and any lookbehind assertion in a subsequent pattern silently failed.
The bug was latent for DokuWiki's entire history because literal exit patterns like `\*\*`, `</file>`, or `%%` don't care what's behind them. It surfaced with c3755410a ("require non-whitespace adjacency for inline formatting delimiters"), which added `(?<=[^\s])` to Strong, Emphasis, Underline, Monospace, Subscript, Superscript and Deleted at once. After that commit, `**[[link]]**` stopped closing — the `]` that would satisfy the lookbehind had just been consumed by the link match, so Strong stayed open until end-of-section and swallowed everything after it (list items, headings, the lot).
Fix:
* Lexer::parse and Lexer::reduce track a byte offset into $raw instead of mutating $raw. $initialLength and the shrinking-length arithmetic for absolute match positions are replaced by straight offset increments; the no-progress guard and the trailing-unmatched dispatch both shift to the same cursor.
* ParallelRegex::split takes an optional $offset and passes it to preg_match together with PREG_OFFSET_CAPTURE. PCRE scans from the offset forward but still sees the whole subject, so lookbehinds work across already-consumed tokens. The secondary preg_split call used to carve out pre/post is no longer needed — PREG_OFFSET_CAPTURE gives the match start for free, saving one regex operation per reduce() step.
Regression tests at all three layers:
* ParallelRegexTest — offset plumbing and pre/match accounting. * LexerTest::testIndexLookbehindAcrossConsumedToken — exit-pattern lookbehind targeting the `/>` of a self-closing `<a/>` that was consumed as a SPECIAL token on the previous step. Fails under the old Lexer. * FormattingTest — `**[[link]]**` and `**foo//bar//**` round-trip with correct open/close instructions through the full pipeline.
Also updates ListsTest::testUnorderedListStrong, whose expectations documented the pre-fix buggy behaviour ("formatting able to spread across list items"). With the fix, bold correctly stays within a single list item; the expected call sequence and the comment are updated to match.
show more ...
|
| 8ab4ec30 | 16-Apr-2026 |
Andreas Gohr <gohr@cosmocode.de> |
remove dead ParallelRegex::apply() method
Remove apply() which was never called from production code. Rewrite the inherited SimpleTest tests to use split() instead, and add a test for pre/post-match
remove dead ParallelRegex::apply() method
Remove apply() which was never called from production code. Rewrite the inherited SimpleTest tests to use split() instead, and add a test for pre/post-match splitting.
show more ...
|