[ruby-core:109648] [Ruby master Bug#18972] String#byteslice should return BINARY (aka ASCII-8BIT) Strings
From:
"byroot (Jean Boussier)" <noreply@...>
Date:
2022-08-23 15:50:00 UTC
List:
ruby-core #109648
Issue #18972 has been updated by byroot (Jean Boussier). Status changed from Open to Rejected Ok, I suppose your point of view make sense, and either way the backward compatibility concern is just too big. Closing. ---------------------------------------- Bug #18972: String#byteslice should return BINARY (aka ASCII-8BIT) Strings https://bugs.ruby-lang.org/issues/18972#change-98871 * Author: byroot (Jean Boussier) * Status: Rejected * Priority: Normal * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- While working on implementing https://bugs.ruby-lang.org/issues/13626, I noticed `byteslice` assign the receiver encoding to the returned String. I believe this is incorrect, as since you are doing a byte based operation, you do expect a binary string in return, otherwise if you'd call it on an UTF-8 string, you'd likely get a string with invalid encoding. I read the original feature request and there's no mention of what the returned encoding should be: https://bugs.ruby-lang.org/issues/4447 ### Current behavior ```ruby >> "f辿e".byteslice(1).valid_encoding? => false >> "f辿e".byteslice(1).encoding => #<Encoding:UTF-8> ``` ### Expected behavior ```ruby >> "f辿e".byteslice(1).valid_encoding? => true >> "f辿e".byteslice(1).encoding => #<Encoding:ASCII-8BIT> ``` ### Backward compatibility concerns I'm honestly not quite sure what the backward incompatibility impact may be. From my point of view if you are calling `byteslice` it's to use it with other binary string, but it's indeed possible that there is existing code mixing UTF-8 and BINARY that somewhat work and would be broken by this change. Especially since binary strings can silently be promoted from BINARY to UTF-8: ```ruby buffer = "".b buffer << "f辿e" # buffer was promoted to Encoding::UTF-8 silently buffer << "f辿e".byteslice(1) ``` The above currently "works", but would raise `Encoding::CompatibilityError` with this change. -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>