From: "byroot (Jean Boussier)" Date: 2022-08-23T11:03:44+00:00 Subject: [ruby-core:109641] [Ruby master Bug#18972] String#byteslice should return BINARY (aka ASCII-8BIT) Strings Issue #18972 has been reported by byroot (Jean Boussier). ---------------------------------------- Bug #18972: String#byteslice should return BINARY (aka ASCII-8BIT) Strings https://bugs.ruby-lang.org/issues/18972 * Author: byroot (Jean Boussier) * Status: Open * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- While working on implementing https://bugs.ruby-lang.org/issues/13626, I noticed `byteslice` assign the receiver encoding to the returned String. I believe this is incorrect, as since you are doing a byte based operation, you do expect a binary string in return, otherwise if you'd call it on an UTF-8 string, you'd likely get a string with invalid encoding. I read the original feature request and there's no mention of what the returned encoding should be: https://bugs.ruby-lang.org/issues/4447 ### Current behavior ```ruby >> "f��e".byteslice(1).valid_encoding? => false >> "f��e".byteslice(1).encoding => # ``` ### Expected behavior ```ruby >> "f��e".byteslice(1).valid_encoding? => true >> "f��e".byteslice(1).encoding => # ``` ### Backward compatibility concerns I'm honestly not quite sure what the backward incompatibility impact may be. From my point of view if you are calling `byteslice` it's to use it with other binary string, but it's indeed possible that there is existing code mixing UTF-8 and BINARY that somewhat work and would be broken by this change. Especially since binary strings can silently be promoted from BINARY to UTF-8: ```ruby buffer = "".b buffer << "f��e" # buffer was promoted to Encoding::UTF-8 silently buffer << "f��e".byteslice(1) ``` The above currently "works", but would raise `Encoding::CompatibilityError` with this change. -- https://bugs.ruby-lang.org/ Unsubscribe: