From: "timothyg56 (Timothy Garnett)" Date: 2013-02-26T05:12:32+09:00 Subject: [ruby-core:52892] [ruby-trunk - Bug #7845] Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1) Issue #7845 has been updated by timothyg56 (Timothy Garnett). I'm not sure how convincing the linked conversation is. It seems to be about case sensitivity issues in varying locales particularly around identifiers, but whether a unicode space is a whitespace or not is not locale dependent as far as I know. It seems like strip, which is just whitespace, could easily be encoding aware while upcase/downcase and the like were ascii only for the cited complexity reasons. It would be nice if strip removed the equivalent of [[:space:]] as it used to, but I guess that's what open source is for. If anyone stumbling upon this wants to patch ruby itself to restore the old behavior see https://gist.github.com/tgarnett/5032660 for ruby source or you can monkey patch in a fix to string class String def lstrip sub(/^[[:space:]]+/, '') end def rstrip sub(/[[:space:]]+$/, '') end def strip lstrip.rstrip end # etc. for ! versions end ---------------------------------------- Bug #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1) https://bugs.ruby-lang.org/issues/7845#change-37004 Author: timothyg56 (Timothy Garnett) Status: Rejected Priority: Normal Assignee: naruse (Yui NARUSE) Category: M17N Target version: ruby -v: ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-linux] Strip and associated methods in ruby 1.9.2 and 1.9.3 do not remove leading/trailing unicode space characters (such as non-breaking space \u00A0 and ideographic space \u3000) unlike ruby 1.9.1. I'd expect the 1.9.1 behavior. Looking at the underlying native lstrip! and rstrip! methods it looks like this is because 1.9.1 uses rb_enc_isspace() whereas 1.9.2+ uses rb_isspace(). 1.9.1p378 :001 > "\u3000\u00a0".strip => "" 1.9.2p320 :001 > "\u3000\u00a0".strip => "�����" 1.9.3p286 :001 > "\u3000\u00a0".strip => "�����" -- http://bugs.ruby-lang.org/