From: takashikkbn@...
Date: 2021-06-15T20:50:32+00:00
Subject: [ruby-core:104289] [Ruby master Bug#17992] Upstreaming the htmlentities gem into CGI#.(un)escape_html
Issue #17992 has been updated by k0kubun (Takashi Kokubun).
Status changed from Open to Feedback
Could you clarify a bit more context about why you'd like to escape these characters not supported in `CGI.escapeHTML`?
I believe `CGI.escapeHTML` has been primarily used to avoid breaking the DOM structure by the escaped content with optimal performance. It's a very understandable behavior to me, and I would prefer rather not escaping any other character for the best performance as long as it's not considered as a security vulnerability.
```rb
require 'benchmark/ips'
require 'htmlentities'
require 'cgi/escape'
str = <<~HTML
Example Domain
This domain is established to be used for illustrative examples in documents. You may use this
domain in examples without prior coordination or asking for permission.
More information...
HTML
coder = HTMLEntities.new
Benchmark.ips do |x|
x.report("CGI.escapeHTML") { CGI.escapeHTML(str) }
x.report("HTMLEntities #{HTMLEntities::VERSION::STRING}") { coder.encode(str) }
x.compare!
end
```
```
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
Warming up --------------------------------------
CGI.escapeHTML 112.937k i/100ms
HTMLEntities 4.3.4 1.029k i/100ms
Calculating -------------------------------------
CGI.escapeHTML 1.131M (� 2.3%) i/s - 5.760M in 5.095252s
HTMLEntities 4.3.4 10.281k (� 2.1%) i/s - 51.450k in 5.006333s
Comparison:
CGI.escapeHTML: 1131036.5 i/s
HTMLEntities 4.3.4: 10281.4 i/s - 110.01x (� 0.00) slower
```
Note that `CGI.escapeHTML` is the default HTML escape method. You'll make every embedded Ruby expression 110x slower if you suddenly replace `CGP.escapeHTML` with that gem.
We may want to support escaping some other characters for some other usages, but for backward compatibility and the performance in existing places, the feature must be enabled by a new option or another method.
----------------------------------------
Bug #17992: Upstreaming the htmlentities gem into CGI#.(un)escape_html
https://bugs.ruby-lang.org/issues/17992#change-92506
* Author: AMomchilov (Alexander Momchilov)
* Status: Feedback
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Hi there,
I was looking to unescape some HTML entities in a String, and I discovered that `CGI#.(un)escape_html` is **really** limited. Many StackOverflow questions share a similar disappointment, and point users to using the [htmlentities gem](https://github.com/threedaymonk/htmlentities):
1. https://stackoverflow.com/a/383561/3141234
2. https://stackoverflow.com/a/22926384/3141234
This solved my problem, but I feel like something this standard/universal should be built-in. To that end, I'm interested in working on merging the htmlentities gem into CGI's repo. Would this be a welcome change?
* I've e-mailed the author (Paul Battley) privately, and got his blessing to do so.
* It's MIT licensed, so that should be OK.
--
https://bugs.ruby-lang.org/
Unsubscribe: