From: matthew@... Date: 2019-06-17T23:44:42+00:00 Subject: [ruby-core:93214] [Ruby trunk Bug#15933] OpenURI: Assign default charset for HTTPS as well as HTTP Issue #15933 has been updated by phluid61 (Matthew Kerwin). A lot of those quoted specs are very, very old, and in some cases obsoleted by newer specs. HTTP/1.1 Semantics and Content [RFC7231/B](https://tools.ietf.org/html/rfc7231#appendix-B): > The default charset of ISO-8859-1 for text media types has been > removed; the default is now whatever the media type definition says. Text Media Types [RFC6838/4.2.1](https://tools.ietf.org/html/rfc6838#section-4.2.1): > If a "charset" parameter is specified, it SHOULD be a required > parameter, eliminating the options of specifying a default value. If > there is a strong reason for the parameter to be optional despite > this advice, each subtype MAY specify its own default value, or > alternatively, it MAY specify that there is no default value. > Finally, the "UTF-8" charset [RFC3629] SHOULD be selected as the > default. See [RFC6657] for additional information on the use of > "charset" parameters in conjunction with subtypes of text. > > Regardless of what approach is chosen, all new text/* registrations > MUST clearly specify how the charset is determined; relying on the > US-ASCII default defined in Section 4.1.2 of [RFC2046] is no longer > permitted. If explanatory text is needed, this SHOULD be placed in > the additional information section of the registration. Most current `text/csv` spec [RFC7111/5.1](https://tools.ietf.org/html/rfc7111#section-5.1) > The "charset" parameter specifies the charset employed by the CSV > content. In accordance with RFC 6657 [RFC6657], the charset > parameter SHOULD be used, and if it is not present, UTF-8 SHOULD > be assumed as the default (this implies that US-ASCII CSV will > work, even when not specifying the "charset" parameter). Any > charset defined by IANA for the "text" tree may be used in > conjunction with the "charset" parameter. So it seems if you're making a change, it should be: ignore the protocol, and default to UTF-8 for `text/csv`. ---------------------------------------- Bug #15933: OpenURI: Assign default charset for HTTPS as well as HTTP https://bugs.ruby-lang.org/issues/15933#change-78664 * Author: gareth (Gareth Adams) * Status: Assigned * Priority: Normal * Assignee: akr (Akira Tanaka) * Target version: * ruby -v: * Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- Using `open-uri` to load a document in the following circumstances: * The `Content-Type` header is `text/*` and *doesn't* specify a charset, e.g. `Content-Type: text/csv` * The document is loaded from an `https://` URL ���will cause the resulting string to have `ASCII-8BIT` encoding. As the [documentation for OpenURI#charset](https://github.com/ruby/ruby/blob/trunk/lib/open-uri.rb#L538-L560) mentions, [RFC2616/3.7.1](https://tools.ietf.org/html/rfc2616#section-3.7.1) says: > When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. OpenURI takes this literally - only assigning ISO-8859-1 if `@base_uri.scheme` is *exactly* "http". This check was written [17 years ago](https://github.com/ruby/ruby/commit/3a20ed532b57da1e58287a5c53abe14400a085f4#diff-0f19cb99597e5fb90bfb937b22143b51R264) in 2002 even before TLS 1.1 was defined, and well before HTTPS was common. I believe this check should now also match the scheme "https". As [RFC2818/2](https://tools.ietf.org/html/rfc2818#section-2) says: > Conceptually, HTTP/TLS is very simple. Simply use HTTP over TLS precisely as you would use HTTP over TCP 1. Is this a suitable change to make? 2. I have a patch to fix the functionality (attached). What else do I need to specify in terms of documentation/tests? I'm happy to put more work into this, but it's my first contribution to Ruby core and I'd like some pointers. I've read through https://bugs.ruby-lang.org/projects/ruby/wiki/HowToReport ---Files-------------------------------- ruby-changes.patch (1.21 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>