From: "mame (Yusuke Endoh)" Date: 2022-08-02T09:27:30+00:00 Subject: [ruby-core:109412] [Ruby master Feature#18822] Ruby lack a proper method to percent-encode strings for URIs (RFC 3986) Issue #18822 has been updated by mame (Yusuke Endoh). We discussed this issue at the dev meeting. How about the following? * Introduce `CGI.escapeURIComponent(str)` that behaves like `CGI.escape`, except that a space is encoded as `%20` instead of `+` (as @byroot proposed) * Introduce `CGI.unescapeURIComponent(str)` that is a reverse operation. * Introduce two aliases like `CGI.escape_uri_component(str)` * Do not introduce `CGI.encode_www_form_component` (but improvement of the rdoc of `CGI.escape` is welcome) (There was a very long discussion, but I didn't understand it due to my lack of knowledge. Please see [the dev-meeting-log](https://github.com/ruby/dev-meeting-log/blob/master/DevMeeting-2022-07-21.md#feature-18822-ruby-lack-a-proper-method-to-percent-encode-strings-for-uris-rfc-3986-byroot).) ---------------------------------------- Feature #18822: Ruby lack a proper method to percent-encode strings for URIs (RFC 3986) https://bugs.ruby-lang.org/issues/18822#change-98562 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal ---------------------------------------- ### Context There are two fairly similar encoding methods that are often confused. `application/x-www-form-urlencoded` which is how form data is encoded, and "percent-encoding" as defined by [RFC 3986](https://www.rfc-editor.org/rfc/rfc3986). AFAIK, the only way they differ is that "form encoding" escape space characters as `+`, and RFC 3986 escape them as `%20`. Most of the time it doesn't matter, but sometimes it does. ### Ruby form and URL escape methods - `URI.escape(" ") # => "%20"` but it was deprecated and removed (in 3.0 ?). - `ERB::Util.url_encode(" ") # => "%20"` but it's implemented with a `gsub` and isn't very performant. It's also awkward to have to reach for `ERB` - `CGI.escape(" ") # => "+"` - `URI.encode_www_form_component(" ") # => "+"` ### Unescape methods For unescaping, it's even more of a clear cut since `URI.unescape` was removed. So there's no available method that won't treat an unescaped `+` as simply `+`. e.g. in Javascript: `decodeURIComponent("foo+bar") #=> "foo+bar"`. If one were to use `CGI.unescape`, the string might be improperly decoded: `GI.unescape("foo+bar") #=> "foo bar"`. ### Other languages - Javascript `encodeURI` and `encodeURIComponent` use `%20`. - PHP has `urlencode` using `+` and `rawurlencode` using `%20`. - Python has `urllib.parse.quote` using `%20` and `urllib.parse.quote_plus` using `+`. ### Proposal Since `CGI` already have a very performant encoder for `application/x-www-form-urlencoded`, I think it would make sense that it would provide another method for RFC3986. I propose: - `CGI.url_encode(" ") # => "%20"` - Or `CGI.encode_url`. - Alias `CGI.escape` as `GCI.encode_www_form_component` - Clarify the documentation of `CGI.escape`. -- https://bugs.ruby-lang.org/ Unsubscribe: