[ruby-dev:46828] [ruby-trunk - Bug #7646][Assigned] String#each_lineでinvalid byte sequence

From: "kosaki (Motohiro KOSAKI)" <kosaki.motohiro@...>
Date: 2013-01-02 17:07:29 UTC
List: ruby-dev #46828
Issue #7646 has been updated by kosaki (Motohiro KOSAKI).

Category set to core
Status changed from Open to Assigned
Assignee set to nobu (Nobuyoshi Nakada)
Priority changed from Normal to High
Target version set to 2.0.0

これはどうみても regressionじゃないかな。
2.0.0タグつけます。
----------------------------------------
Bug #7646: String#each_lineでinvalid byte sequence
https://bugs.ruby-lang.org/issues/7646#change-35181

Author: yoshidam (Yoshida Masato)
Status: Assigned
Priority: High
Assignee: nobu (Nobuyoshi Nakada)
Category: core
Target version: 2.0.0
ruby -v: ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]


=begin
String#each_lineでセパレータを指定したときにASCII以外の文字でinvalid byte sequenceが発生します。

 $ ruby -ve '"\n\u0100".each_line("\n") {|l| p l }'
 ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
 "\n"
 -e:1:in `each_line': invalid byte sequence in UTF-8 (ArgumentError)
 from -e:1:in `<main>'

r38616あたりの変更で入ったバグのようです。

  
 --- string.c.org        2012-12-27 21:57:07.000000000 +0900
 +++ string.c    2013-01-02 23:36:47.000000000 +0900
 @@ -6199,14 +6199,14 @@
         if (c == newline &&
             (rslen <= 1 ||
              (pend - p >= rslen && memcmp(RSTRING_PTR(rs), p, rslen) == 0))) {
 -           p += (rslen ? rslen : n);
 -           line = rb_str_subseq(str, s - ptr, p - s);
 +           const char *pp = p + (rslen ? rslen : n);
 +           line = rb_str_subseq(str, s - ptr, pp - s);
             if (wantarray)
                 rb_ary_push(ary, line);
             else
                 rb_yield(line);
             str_mod_check(str, ptr, len);
 -           s = p;
 +           s = pp;
         }
         p += n;
      }

=end


-- 
http://bugs.ruby-lang.org/

In This Thread

Prev Next