From: naruse@... Date: 2016-12-12T18:39:42+00:00 Subject: [ruby-core:78620] [Ruby trunk Bug#12852] URI.parse can't handle non-ascii URIs Issue #12852 has been updated by Yui NARUSE. Matthew Kerwin wrote: > Your thinking here seems confused. If a String contains non-ASCII characters then it's not a URI. If it is a URI then it strictly matches the definition of a URI. If a String contains a valid IRI, then yeah, you're not going to get much help from Ruby; but IRIs are not commonly used in the real world anyway. The concept sounds reasonable. And I'm considering URL Standard's parsing logic is more suitable for Ruby's URI.parse. https://url.spec.whatwg.org/ But the algorithm is still developing. ---------------------------------------- Bug #12852: URI.parse can't handle non-ascii URIs https://bugs.ruby-lang.org/issues/12852#change-62007 * Author: Olivier Lacan * Status: Open * Priority: Normal * Assignee: akira yamada * Target version: * ruby -v: * Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN ---------------------------------------- Given a return URL path like: `/search?utf8=\u{2713}&q=foo`, `URI.parse` raises the following exception: ```ruby URI.parse "/search?utf8=\u{2713}&q=foo" URI::InvalidURIError: URI must be ascii only "/search?utf8=\u{2713}&q=foo" ``` This `\u{2713}` character is commonly used by web frameworks like Rails to enforce UTF-8 in forms: https://github.com/rails/rails/blob/92703a9ea5d8b96f30e0b706b801c9185ef14f0e/actionview/lib/action_view/helpers/form_tag_helper.rb#L823-L830 ```ruby "\u{2713}" => "���" ``` Is it unreasonable to expect non-ascii portion of URIs to be handled by URI.parse? The way to circumvent this issue is to call URI.encode on the URI string prior to passing it to URI.parse: ```ruby URI.parse URI.encode("/search?utf8=\u{2713}&q=foo") => # ``` By comparison, a library like Addressable parses this URI without issue. ``` require "addressable/uri" => # ``` This is how Addressable implements parsing: https://github.com/sporkmonger/addressable/blob/a15b7045a09911bcc47b106200554809c879a5f6/lib/addressable/uri.rb#L75-L145 PS: Tried under MRI 2.3.1 and 2.4.0-preview1 -- https://bugs.ruby-lang.org/ Unsubscribe: