From: andrew@... Date: 2017-05-23T10:05:38+00:00 Subject: [ruby-core:81344] [Ruby trunk Feature#13588] Add Encoding#min_char_size, #max_char_size, #minmax_char_size Issue #13588 has been updated by haines (Andrew Haines). I'm implementing a tar archive reader that takes an arbitrary stream (`StringIO`, `File`, `Zlib::GzipReader`, ...) and yields the individual files in the archive. I'd like the yielded file reader to conform as closely as possible to the `File` interface. I'd like to implement `#getc` without necessarily being able to modify the `external_encoding` of the underlying stream. My strategy so far is to keep reading bytes into a buffer and `force_encoding` to the target encoding, until I have `valid_encoding?`. If I know the character length limits, then I can bail out if I still don't have a valid character after I've read the maximum number of bytes, return a string containing only the minimum number of bytes, and hold the extras back for the next invocation of `#getc` (this seems to be the behaviour of `IO#getc`). This is how that would look with the proposed methods: ~~~ ruby def getc check_not_closed! return nil if eof? char = String.new(encoding: Encoding::BINARY) min_char_size, max_char_size = external_encoding.minmax_char_size until char.size == max_char_size || eof? char << read(min_char_size) char.force_encoding external_encoding return encode(char) if char.valid_encoding? char.force_encoding Encoding::BINARY end char.slice!(min_char_size..-1).bytes.reverse_each do |byte| ungetbyte byte end encode(char) end ~~~ ---------------------------------------- Feature #13588: Add Encoding#min_char_size, #max_char_size, #minmax_char_size https://bugs.ruby-lang.org/issues/13588#change-65042 * Author: haines (Andrew Haines) * Status: Feedback * Priority: Normal * Assignee: * Target version: ---------------------------------------- When implementing an IO-like object, I'd like to handle encoding correctly. To do so, I need to know the minimum and maximum character sizes for the encoding of the stream I'm reading. However, I can't find a way to access this information from Ruby (I ended up writing a gem with a native extension [1] to do so). I'd like to propose adding instance methods `min_char_size`, `max_char_size`, and `minmax_char_size` to the `Encoding` class to expose the information stored in the `OnigEncodingType` struct's `min_enc_len` and `max_enc_len` fields. ~~~ ruby Encoding::UTF_8.min_char_size # => 1 Encoding::UTF_8.max_char_size # => 6 Encoding::UTF_8.minmax_char_size # => [1, 6] ~~~ [1] https://github.com/haines/char_size -- https://bugs.ruby-lang.org/ Unsubscribe: