From: "naruse (Yui NARUSE)" Date: 2013-11-21T16:35:22+09:00 Subject: [ruby-core:58459] [ruby-trunk - Feature #9111] Encoding-free String comparison Issue #9111 has been updated by naruse (Yui NARUSE). Hanmac (Hans Mackowiak) wrote: > what about strings with the same encoding, but different content, but that is turned the same? > like "��" can be maked from "a" + "^" somehow, should they also treated as equal? The standard practice is NFD("��") == NFD("a" + "^"). To NFD, you can use some libraries. see also http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/ ---------------------------------------- Feature #9111: Encoding-free String comparison https://bugs.ruby-lang.org/issues/9111#change-43054 Author: sawa (Tsuyoshi Sawada) Status: Open Priority: Normal Assignee: Category: Target version: =begin Currently, strings with the same content but with different encodings count as different strings. This causes strange behaviour as below (noted in StackOverflow question http://stackoverflow.com/questions/19977788/strange-behavior-in-packed-ruby-strings#19978206): [128].pack("C") # => "\x80" [128].pack("C") == "\x80" # => false Since `[128].pack("C")` has the encoding ASCII-8BIT and `"\x80"` (by default) has the encoding UTF-8, the two strings are not equal. Also, comparison of strings with different encodings may end up with a messy, unintended result. I suggest that the comparison `String#<=>` should not be based on the respective encoding of the strings, but all the strings should be internally converted to UTF-8 for the purpose of comparison. =end -- http://bugs.ruby-lang.org/