[ruby-core:94404] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash

From: vivian.unger@...
Date: 2019-08-17 18:41:17 UTC
List: ruby-core #94404
Issue #16108 has been updated by VivianUnger (Vivian Unger).


I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bold text. LaTeX indicates bold text with the tag \textbf{[bold text]} while Macrex wraps it in backslashes: \\[bold text]\\.

In my test case, the input string is:

```
\indexentry{\textbf{bold}|hyperpage}{2}
```

I need to convert this into:

```
\indexentry{\bold\|hyperpage}{2}
```

For this I am using the following code:

``` ruby
record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')
```

But instead of the expected output, I get:

```
\indexentry{\1\|hyperpage}{2}
```

...as if I only had 2 backslashes rather than three.

I have tried using the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the two backslashes:

``` ruby
record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')
```

...giving the result:

```
\indexentry{\ bold\|hyperpage}{2}
```

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!

----------------------------------------
Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash
https://bugs.ruby-lang.org/issues/16108#change-80824

* Author: VivianUnger (Vivian Unger)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\.

In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')

But instead of the expected output, I get:

\indexentry{\1\|hyperpage}{2}

...as if I only had \\ rather than \\\.

I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the \\:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')

...giving the result:

\indexentry{\ bold\|hyperpage}{2}

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next