From: eregontp@... Date: 2019-07-01T14:38:27+00:00 Subject: [ruby-core:93453] [Ruby master Feature#15940] Coerce symbols internal fstrings in UTF8 rather than ASCII to better share memory with string literals Issue #15940 has been updated by Eregon (Benoit Daloze). Sharing char* is a more general optimization, and could apply to more cases (e.g., frozen Strings with identical bytes but different encodings). So I'm thinking that would be better rather than changing semantics for the (rather obscure to end users) purpose of fitting better with the current fstring representation. I'd like another reason than the internal optimization which can be done another way if we do this, but it's just my opinion. ---------------------------------------- Feature #15940: Coerce symbols internal fstrings in UTF8 rather than ASCII to better share memory with string literals https://bugs.ruby-lang.org/issues/15940#change-79000 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- Patch: https://github.com/ruby/ruby/pull/2242 It's not uncommon for symbols to have literal string counterparts, e.g. ```ruby class User attr_accessor :name def as_json { 'name' => name } end end ``` Since the default source encoding is UTF-8, and that symbols coerce their internal fstring to ASCII when possible, the above snippet will actually keep two instances of `"name"` in the fstring registry. One in ASCII, the other in UTF-8. Considering that UTF-8 is a strict superset of ASCII, storing the symbols fstrings as UTF-8 instead makes no significant difference, but allows in most cases to reuse the equivalent string literals. The only notable behavioral change is `Symbol#to_s`. Previously `:name.to_s.encoding` would be `#`. After this patch it's `#`. I can't foresee any significant compatibility impact of this change on existing code. However, there are several ruby specs asserting this behavior, but I don't know if they can be changed or not: https://github.com/ruby/spec/commit/a73a1c11f13590dccb975ba4348a04423c009453 If this specification is impossible to change, then we could consider changing the encoding of the String returned by `Symbol#to_s`, e.g in ruby pseudo code: ```ruby def to_s str = fstr.dup str.force_encoding(Encoding::ASCII) if str.ascii_only? str end ``` -- https://bugs.ruby-lang.org/ Unsubscribe: