From: Michael Selig Date: 2008-09-16T07:38:13+09:00 Subject: [ruby-core:18611] Re: [Bug #566] String encoding error messages are inconsistent On Mon, 15 Sep 2008 22:24:57 +1000, Yukihiro Matsumoto wrote: > I am not sure what you mean by "inconsistent". What are your ideal > messages (or behavior) for each case? > > > In message "Re: [ruby-core:18600] [Bug #566] String encoding error > messages are inconsistent" > on Mon, 15 Sep 2008 15:50:17 +0900, Michael Selig > writes: > > |Please compare: > |"abc".encode("UTF-16BE") << "abc" > |==> EncodingCompatibilityError: incompatible character encodings: > UTF-16BE and US-ASCII > |and: > |"abc".encode("UTF-16BE") =~ /abc/ > |==> ArgumentError: incompatible encoding regexp match (US-ASCII regexp > with UTF-16BE string) I would expect these to both be "EncodingCompatibilityError" > | > |also handling of broken (illegal) string encodings is not consistent: > |"abc".force_encoding("UTF-16BE") =~ /abc/ > |==> ArgumentError: broken UTF-16BE string > |and: > |"abc".force_encoding("UTF-16BE") == "abc" > |==> false (no error) > |and: > |"abc".encode("UTF-16BE").count("b".force_encoding("UTF-16BE")) > |==> ArgumentError: invalid byte sequence in UTF-16BE I guess in this group there are 2 issues: 1) (This is minor) I would expect both error messages to have the same text - I think the "invalid byte sequence in XXX" is the better. 2) It seems inconsistent to me that the 1st & 2nd expressions look almost the same as each other (a regexp match & a string compare) yet only the regexp match raises an error. In fact I have noticed that most String methods seem not to complain when operating on broken strings, but Regexps do. There is actually a rather bizzare test in test_m17n.rb that relies on String methods NOT complaining that they are operating on broken strings: s = "\xa1".force_encoding("euc-jp") assert_equal(true, "".center(2, s).valid_encoding?) Here "\xa1" by itself is an invalid euc-jp char, but "\xa1\xa1" is valid. This test is actually relying on the fact that String#center is putting the 2 invalid characters around a null string without complaining and creating one valid character! I think this behaviour could be confusing to a ruby programmer - padding to 2 chars and getting a 1 character result - probably not what was intended. To me it would be preferable if Regexp & String methods behaved the same way in this regard - probably the best would be to raise errors in both. That would prevent confusing behaviour like the above test. Cheers Mike.