[#78633] ruby/spec needs help from CRuby committers — Benoit Daloze <eregontp@...>
Currently, ruby/spec is maintained mostly by individuals and enjoys the
13 messages
2016/12/13
[#78963] Re: ruby/spec needs help from CRuby committers
— Urabe Shyouhei <shyouhei@...>
2017/01/04
I did ask attendees of last developer meeting to join this
[#78642] Re: ruby/spec needs help from CRuby committers
— Eric Wong <normalperson@...>
2016/12/14
Benoit Daloze <eregontp@gmail.com> wrote:
[ruby-core:78620] [Ruby trunk Bug#12852] URI.parse can't handle non-ascii URIs
From:
naruse@...
Date:
2016-12-12 18:39:42 UTC
List:
ruby-core #78620
Issue #12852 has been updated by Yui NARUSE.
Matthew Kerwin wrote:
> Your thinking here seems confused. If a String contains non-ASCII characters then it's not a URI. If it is a URI then it strictly matches the definition of a URI. If a String contains a valid IRI, then yeah, you're not going to get much help from Ruby; but IRIs are not commonly used in the real world anyway.
The concept sounds reasonable.
And I'm considering URL Standard's parsing logic is more suitable for Ruby's URI.parse.
https://url.spec.whatwg.org/
But the algorithm is still developing.
----------------------------------------
Bug #12852: URI.parse can't handle non-ascii URIs
https://bugs.ruby-lang.org/issues/12852#change-62007
* Author: Olivier Lacan
* Status: Open
* Priority: Normal
* Assignee: akira yamada
* Target version:
* ruby -v:
* Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN
----------------------------------------
Given a return URL path like: `/search?utf8=\u{2713}&q=foo`, `URI.parse` raises the following exception:
```ruby
URI.parse "/search?utf8=\u{2713}&q=foo"
URI::InvalidURIError: URI must be ascii only "/search?utf8=\u{2713}&q=foo"
```
This `\u{2713}` character is commonly used by web frameworks like Rails to enforce UTF-8 in forms: https://github.com/rails/rails/blob/92703a9ea5d8b96f30e0b706b801c9185ef14f0e/actionview/lib/action_view/helpers/form_tag_helper.rb#L823-L830
```ruby
"\u{2713}"
=> "✓"
```
Is it unreasonable to expect non-ascii portion of URIs to be handled by URI.parse? The way to circumvent this issue is to call URI.encode on the URI string prior to passing it to URI.parse:
```ruby
URI.parse URI.encode("/search?utf8=\u{2713}&q=foo")
=> #<URI::Generic /search?utf8=%E2%9C%93&q=foo>
```
By comparison, a library like Addressable parses this URI without issue.
```
require "addressable/uri"
=> #<Addressable::URI:0x3feffa84158c URI:/search?utf8=✓&q=foo>
```
This is how Addressable implements parsing:
https://github.com/sporkmonger/addressable/blob/a15b7045a09911bcc47b106200554809c879a5f6/lib/addressable/uri.rb#L75-L145
PS: Tried under MRI 2.3.1 and 2.4.0-preview1
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>