From: "jeremyevans0 (Jeremy Evans)" Date: 2022-11-25T17:55:52+00:00 Subject: [ruby-core:111012] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding Issue #18899 has been updated by jeremyevans0 (Jeremy Evans). After more research, it appears the current behavior is expected. Parsing the single string with embedded colon is already handled correctly. However, if the external encoding is binary/ASCII-8BIT, then the internal encoding is deliberately set to `nil`: ```c // in rb_io_ext_int_to_encs if (ext == rb_ascii8bit_encoding()) { /* If external is ASCII-8BIT, no transcoding */ intern = NULL; } ``` Basically, the `'binary:utf-8'` encoding doesn't make sense. Providing two encodings is done to transcode from one encoding to the other. There is no transcoding if the external encoding is binary. If you want the internal encoding to be UTF-8, then just use `'utf-8'`. That still leaves us with inconsistent behavior between `'binary:utf-8'` and `'binary', 'utf-8'`. So I propose to make the `'binary', 'utf-8'` behavior the same as `'binary:utf-8'`. I updated my pull request to do that: https://github.com/ruby/ruby/pull/6280 An alternative approach would be to remove the above code to treat the external encoding specially. ---------------------------------------- Bug #18899: Inconsistent argument handling in IO#set_encoding https://bugs.ruby-lang.org/issues/18899#change-100263 * Author: javanthropus (Jeremy Bopp) * Status: Open * Priority: Normal * ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] * Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN ---------------------------------------- `IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding. This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`: ```ruby #!/usr/bin/env ruby def show(io, args) printf( "args: %-50s external encoding: %-25s internal encoding: %-25s\n", args.inspect, io.external_encoding.inspect, io.internal_encoding.inspect ) end File.open('/dev/null') do |f| args = ['binary:utf-8'] f.set_encoding(*args) show(f, args) args = ['binary', 'utf-8'] f.set_encoding(*args) show(f, args) args = [Encoding.find('binary'), Encoding.find('utf-8')] f.set_encoding(*args) show(f, args) end ``` This behavior is the same from Ruby 2.7.0 to 3.1.2. -- https://bugs.ruby-lang.org/ Unsubscribe: