From: XrXr@... Date: 2019-08-17T20:00:01+00:00 Subject: [ruby-core:94405] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash Issue #16108 has been updated by alanwu (Alan Wu). The source of your problem seem to be the behavior below: ```ruby p ' \1 '.bytes # => [32, 92, 49, 32] p ' \\ '.bytes # => [32, 92, 32] p ' \ '.bytes # => [32, 92, 32] ``` as you can see, two backslashes in a single quote string literal only gives one backslash in the resulting string. This is future complicated by gsub interpreting the content of the second argument as a replacement directive. The means interpreting the backslashes for a second time. You want the final replacement to be "one backslash, followed by the first match group, then another backslash", or literally `\\1\` (`[92, 92, 49, 92]`). The replacement directive to express this is `\\\1\\` (`[92, 92, 92, 49, 92, 92]`), as we need to escape the first and last backslash by doubling them. We don't want to double the backslash right before "1", as we are not looking for a literal backslash there. Now we need to construct a Ruby string literal we can put in the source code that would give us the replacement directive we want, which we could do by doubling all the backslashes: ```ruby p '\\\\\\1\\\\'.bytes # => [92, 92, 92, 49, 92, 92] ``` We could get rid of one of the backslashes in the before "1", the single quote literal `'\1'` gives `[92, 49]`: ```ruby p '\\\\\1\\\\'.bytes # => [92, 92, 92, 49, 92, 92] ``` We could also get rid of two backslashes after the 1 as `gsub` interprets the lone backslash at the end as a literal backslash. This is too many backslashes for my taste, so I would prefer the block form. It takes the return value of block and substitute that for the mach verbatim. The special `$1` variable is set within the gsub block, which we can use to build the replacement we want: ```ruby input.gsub(pattern) { ["\\", $1, "\\"].join } ``` --- Here is a test program for you: ```ruby input = '\indexentry{\textbf{bold}|hyperpage}{2}' pattern = /\\textbf\{([^\}]+)\}/ test = ->(replacement) { puts "result: #{input.gsub(pattern, replacement)}, replacement: #{replacement.bytes}.map(&:chr).join" } test.call('\\\1\\') test.call('\\ \1\\') test.call('\\\\\\1\\\\') test.call('\\\\\\1\\') test.call('\\\\\1\\') $stdout.write "alternative: " puts input.gsub(pattern) { ["\\", $1, "\\"].join } ``` ---------------------------------------- Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash https://bugs.ruby-lang.org/issues/16108#change-80825 * Author: VivianUnger (Vivian Unger) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\. In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using: record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\') But instead of the expected output, I get: \indexentry{\1\|hyperpage}{2} ...as if I only had \\ rather than \\\. I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result. The kludgey workaround I have found is to leave a space before the \\: record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\') ...giving the result: \indexentry{\ bold\|hyperpage}{2} But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them! -- https://bugs.ruby-lang.org/ Unsubscribe: