From: merch-redmine@... Date: 2018-08-10T04:23:17+00:00 Subject: [ruby-core:88415] [Ruby trunk Feature#14975] String#append without changing receiver's encoding Issue #14975 has been updated by jeremyevans0 (Jeremy Evans). ioquatix (Samuel Williams) wrote: > IMHO, anyone relying on this behaviour is walking on fire. But, you are right, there is the potential to break existing code. I believe the correct solution is for people to avoid using binary buffers for this use case. There already exists `Encoding::ASCII` which would make more sense. So if we limited to `Encoding::BINARY` it at least has a specific semantic model. One way to fix the above, would be to turn the `Encoding::UTF_8` receiver into `Encoding::BINARY`. I'm not sure I like that solution, but it does work in a predictable way and avoids introducing exceptions where none existed before. The problem with that is that single BINARY encoding string can then cause havok in your application, since any string it touches would then become binary. It would defeat the purpose of having encodings at all. > Do you think there is a way we can find a compromise? I'd rather not add yet another string concatenation function. I sort of admire Ruby for being opinionated, so I think if we can find a solution here without adding more options/arguments/methods, that would be ideal. WDYT? I don't think modifying `String#<<` should be considered, it's too likely to break existing code. I'm fairly sure you could add a non-allocating binary append method using a C extension, assuming the receiving string has enough capacity, so this would not even necessarily require a core method. Actually, if you could even get non-allocating code in pure ruby, but it wouldn't be great for performance (for small strings, for large strings it's probably fine): ~~~ ruby def << string enc = encoding string_enc = string.encoding if string_enc != enc force_encoding(string_enc) super(string) force_encoding(enc) else super(string) end end ~~~ That being said, I'd be in favor of having this as a separate String method. I can see it being used quite a lot in network programming where encoded text is mixed with binary protocol logic. ---------------------------------------- Feature #14975: String#append without changing receiver's encoding https://bugs.ruby-lang.org/issues/14975#change-73467 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- I'm not sure where this fits in, but in order to avoid garbage and superfluous function calls, is it possible that `String#<<`, `String#concat` or the (proposed) `String#append` can avoid changing the encoding of the receiver? Right now it's very tricky to do this in a way that doesn't require extra allocations. Here is what I do: ```ruby class Buffer < String BINARY = Encoding::BINARY def initialize super force_encoding(BINARY) end def << string if string.encoding == BINARY super(string) else super(string.b) # Requires extra allocation. end return self end alias concat << end ``` When the receiver is binary, but contains byte sequences, appending UTF_8 can fail: ``` "Foobar".b << "F����bar" => "FoobarF����bar" > "F����bar".b << "F����bar" Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 ``` So, it's not possible to append data, generally, and then call `force_encoding(Encoding::BINARY)`. One must ensure the string is binary before appending it. It would be nice if there was a solution which didn't require additional allocations/copies/linear scans for what should basically be a `memcpy`. See also: https://bugs.ruby-lang.org/issues/14033 and https://bugs.ruby-lang.org/issues/13626#note-3 There are two options to fix this: 1/ Don't change receiver encoding in any case. 2/ Apply 1, but only when receiver is using `Encoding::BINARY` -- https://bugs.ruby-lang.org/ Unsubscribe: