From: "mame (Yusuke Endoh)" Date: 2022-08-18T09:42:34+00:00 Subject: [ruby-core:109544] [Ruby master Bug#18955] Kernel#sprintf - %c ignores a non-ASCII character's encoding Issue #18955 has been updated by mame (Yusuke Endoh). At the dev-meeting, @akr proposed that the format `%c` behaves like `%s` (with the one-codepoint restriction) and @matz agreed with it. ---------------------------------------- Bug #18955: Kernel#sprintf - %c ignores a non-ASCII character's encoding https://bugs.ruby-lang.org/issues/18955#change-98715 * Author: andrykonchin (Andrew Konchin) * Status: Open * Priority: Normal * ruby -v: 3.0.3 * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- I haven't found any similar existing issue so decided to create a new one. I noticed that `sprintf("%c", string)` doesn't handle (in an expected way) a case when encodings of format sequence and string argument aren't the same and the string argument contains non-ASCII character. In this case it seems to me that `sprintf` just uses binary representation of a character and assigns (or interprets with) encoding of the format sequence string. I would expect that `sprintf` negotiates encoding and converts everything (the character and the format string) to the chosen one. And raises error when negotiation fails. Examples to illustrate this behavior: ```ruby format = "%c".encode("Windows-1251") string = "��".encode(Encoding::KOI8_U) r = sprintf(format, string) r.encoding # => # r == "��".encode("Windows-1251") # => false r.codepoints # => [234] string.codepoints # => [234] ``` In this example the result's encoding is a format's encoding. But codepoint isn't changed and equals a codepoint of the character in the original string's encoding. But it should be different: ```ruby "��".encode("Windows-1251").codepoints # => [201] ``` Another example: ```ruby string = "��".encode(Encoding::CP1252) sprintf("%c", string) # => in `sprintf': invalid byte sequence in UTF-8 (ArgumentError) ``` In this example the error means that `sprintf` doesn't encode properly a codepoint (of string's encoding) in UTF-8. It uses just raw bytes. -- https://bugs.ruby-lang.org/ Unsubscribe: