From: samuel@... Date: 2018-08-10T01:32:31+00:00 Subject: [ruby-core:88408] [Ruby trunk Feature#14975] String#append without changing receiver's encoding Issue #14975 has been updated by ioquatix (Samuel Williams). That makes total sense, so it seems like we are in agreement, when the receiver is BINARY, we can append anything to it without changing the encoding. What do you think? I can make PR. ---------------------------------------- Feature #14975: String#append without changing receiver's encoding https://bugs.ruby-lang.org/issues/14975#change-73460 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- I'm not sure where this fits in, but in order to avoid garbage and superfluous function calls, is it possible that `String#<<`, `String#concat` or the (proposed) `String#append` can avoid changing the encoding of the receiver? Right now it's very tricky to do this in a way that doesn't require extra allocations. Here is what I do: ```ruby class Buffer < String BINARY = Encoding::BINARY def initialize super force_encoding(BINARY) end def << string if string.encoding == BINARY super(string) else super(string.b) # Requires extra allocation. end return self end alias concat << end ``` When the receiver is binary, but contains byte sequences, appending UTF_8 can fail: ``` "Foobar".b << "F����bar" => "FoobarF����bar" > "F����bar".b << "F����bar" Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 ``` So, it's not possible to append data, generally, and then call `force_encoding(Encoding::BINARY)`. One must ensure the string is binary before appending it. It would be nice if there was a solution which didn't require additional allocations/copies/linear scans for what should basically be a `memcpy`. See also: https://bugs.ruby-lang.org/issues/14033 and https://bugs.ruby-lang.org/issues/13626#note-3 There are two options to fix this: 1/ Don't change receiver encoding in any case. 2/ Apply 1, but only when receiver is using `Encoding::BINARY` -- https://bugs.ruby-lang.org/ Unsubscribe: