ruby-core

Mailing list archive

[#114936] [Ruby master Feature#19908] Update to Unicode 15.1 — "nobu (Nobuyoshi Nakada) via ruby-core" <ruby-core@...>

Issue #19908 has been reported by nobu (Nobuyoshi Nakada).

24 messages 2023/10/02

[#115016] [Ruby master Bug#19921] TestYJIT#test_bug_19316 test failure — "vo.x (Vit Ondruch) via ruby-core" <ruby-core@...>

Issue #19921 has been reported by vo.x (Vit Ondruch).

21 messages 2023/10/12

[#115033] [Ruby master Misc#19925] DevMeeting-2023-11-07 — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

Issue #19925 has been reported by mame (Yusuke Endoh).

12 messages 2023/10/13

[#115068] [Ruby master Bug#19929] Warnings for `mutex_m`, `drb` and `base64` appears while the gem spec has explicit dependencies — "yahonda (Yasuo Honda) via ruby-core" <ruby-core@...>

Issue #19929 has been reported by yahonda (Yasuo Honda).

8 messages 2023/10/17

[#115071] [Ruby master Misc#19931] to_int is not for implicit conversion? — "Dan0042 (Daniel DeLorme) via ruby-core" <ruby-core@...>

Issue #19931 has been reported by Dan0042 (Daniel DeLorme).

16 messages 2023/10/17

[#115139] [Ruby master Bug#19969] Regression of memory usage with Ruby 3.1 — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

Issue #19969 has been reported by hsbt (Hiroshi SHIBATA).

8 messages 2023/10/24

[#115165] [Ruby master Bug#19972] Install default/bundled gems into dedicated directories — "vo.x (Vit Ondruch) via ruby-core" <ruby-core@...>

Issue #19972 has been reported by vo.x (Vit Ondruch).

11 messages 2023/10/25

[#115196] [Ruby master Feature#19979] Allow methods to declare that they don't accept a block via `&nil` — "ufuk (Ufuk Kayserilioglu) via ruby-core" <ruby-core@...>

Issue #19979 has been reported by ufuk (Ufuk Kayserilioglu).

21 messages 2023/10/29

[ruby-core:115162] [Ruby master Bug#19916] URI#to_s can serialize to a value that doesn't deserialize to the original

From: "Hanmac (Hans Mackowiak) via ruby-core" <ruby-core@...>
Date: 2023-10-25 07:44:50 UTC
List: ruby-core #115162
Issue #19916 has been updated by Hanmac (Hans Mackowiak).


to_s is not serialize / deserialize

also it is not guarantied that to_s returns a string that is parsable 

----------------------------------------
Bug #19916: URI#to_s can serialize to a value that doesn't deserialize to the original
https://bugs.ruby-lang.org/issues/19916#change-105072

* Author: yawboakye (yaw boakye)
* Status: Open
* Priority: Normal
* ruby -v: 3.2.2
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
It appears that when we're serializing a URI to string, we don't check/confirm that it is represented in a form that can be deserialized back into the original. I think it's fair to expect that serialize+deserialize produces an object that is the same as the original, only differing perhaps in terms of the unique object identifier. This isn't the case with URI when they are custom built, which might happen a lot, for example in a Rails app that accepts URL inputs from users. Let me attempt a reproduction, using the generic URI `example.com`.
```ruby
example_url = "example.com"
example_uri = URI(example_url)
```
Given that no scheme is explicitly set in the URI, it is correctly parsed as generic, with the given `example.com` interpreted as the path.
The object returned to is mutable. Since we didn't automatically detect a scheme, let's fix that as well as the hostname.
``` ruby
example_uri.scheme = "https"
example_uri.hostname = example_uri.path

# I've intentionally left path value unchanged, since it helps demonstrate the potential bug.
```

Given that we have a scheme, an authority, and a path, and given that we format URI according to [RFC 3986], one may expect that serializing the URI to string will follow the guidelines of section 3 of the RFC: [Syntax Components], which requires a slash separator between the authority (in our case hostname) and the path. It appears that `URI#to_s` may not do that if path didn't already have a slash prefix. Which would be fine if we were keeping an invariant that ensured that we never produced bad serialized URI. To return to our `example_uri`, serialization produces:
```ruby
serialized_uri = example_uri.to_s
puts serialized_uri # https://example.comexample.com
```

This is obviously bad. One would have expected `https://example.com/example.com` instead. That is, the slash will be automatically and correctly inserted, just as the double slashes were automatically inserted between the scheme and and the authority. `serialized_uri` cannot be deserialized into `example_uri`, in fact. Below is an attempt at deserialization and a comparison of the new value to the original:

```ruby
deserialized_example_uri = URI(serialized_uri)
example_uri.scheme == deserialized_example_uri.scheme # true
example_uri.hostname == deserialized_example_uri.hostname # false (for, example.com =/= example.comexample.com)
example_uri.path == deserialized_example_uri.path # false (for, example.com =/= "")
```

I believe that the ability to serialize and deserialize an object without losing fidelity is a great thing. I believe even more strongly that we should preserve/maintain an invariant that allows us to always serialize a URI to a format that meets the RFC's specification. Therefore I consider this a bug, and I'd be willing to work on a fix, as my first contribution to Ruby, if enough people consider it a bug too.

Regards!

[RFC 3986]: https://www.rfc-editor.org/rfc/rfc3986
[Syntax Components]: https://www.rfc-editor.org/rfc/rfc3986#section-3

---Files--------------------------------
Screenshot 2023-09-29 at 12.19.26.png (180 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

In This Thread