From: samuel@... Date: 2018-08-10T03:50:33+00:00 Subject: [ruby-core:88414] [Ruby trunk Feature#14975] String#append without changing receiver's encoding Issue #14975 has been updated by ioquatix (Samuel Williams). @jeremyevans0 I agree with you. It's a problem. Just for completeness, here is the error you talk about: ```ruby b = 'a'.force_encoding(Encoding::BINARY) u = "\u00ff".force_encoding(Encoding::UTF_8) b << u b.force_encoding(Encoding::BINARY) # Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT u << b ``` IMHO, anyone relying on this behaviour is walking on fire. But, you are right, there is the potential to break existing code. I believe the correct solution is for people to avoid using binary buffers for this use case. There already exists `Encoding::ASCII` which would make more sense. So if we limited to `Encoding::BINARY` it at least has a specific semantic model. One way to fix the above, would be to turn the `Encoding::UTF_8` receiver into `Encoding::BINARY`. I'm not sure I like that solution, but it does work in a predictable way and avoids introducing exceptions where none existed before. Do you think there is a way we can find a compromise? I'd rather not add yet another string concatenation function. I sort of admire Ruby for being opinionated, so I think if we can find a solution here without adding more options/arguments/methods, that would be ideal. WDYT? ---------------------------------------- Feature #14975: String#append without changing receiver's encoding https://bugs.ruby-lang.org/issues/14975#change-73466 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- I'm not sure where this fits in, but in order to avoid garbage and superfluous function calls, is it possible that `String#<<`, `String#concat` or the (proposed) `String#append` can avoid changing the encoding of the receiver? Right now it's very tricky to do this in a way that doesn't require extra allocations. Here is what I do: ```ruby class Buffer < String BINARY = Encoding::BINARY def initialize super force_encoding(BINARY) end def << string if string.encoding == BINARY super(string) else super(string.b) # Requires extra allocation. end return self end alias concat << end ``` When the receiver is binary, but contains byte sequences, appending UTF_8 can fail: ``` "Foobar".b << "F����bar" => "FoobarF����bar" > "F����bar".b << "F����bar" Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 ``` So, it's not possible to append data, generally, and then call `force_encoding(Encoding::BINARY)`. One must ensure the string is binary before appending it. It would be nice if there was a solution which didn't require additional allocations/copies/linear scans for what should basically be a `memcpy`. See also: https://bugs.ruby-lang.org/issues/14033 and https://bugs.ruby-lang.org/issues/13626#note-3 There are two options to fix this: 1/ Don't change receiver encoding in any case. 2/ Apply 1, but only when receiver is using `Encoding::BINARY` -- https://bugs.ruby-lang.org/ Unsubscribe: