From: duerst@... Date: 2014-08-04T10:09:41+00:00 Subject: [ruby-core:64185] [ruby-trunk - Bug #10097] Case-insensitive Regexp matching for Windows-1252 not working for ŠšŽžŒœÿŸ Issue #10097 has been updated by Martin D��rst. Nobuyoshi Nakada wrote: > I've forgotten the test file, "test/ruby/enc/test_windows_1252.rb", and added it now. > What tests are needed? Kimihito Matsui, one of my students, is working on tests (not only for windows 1252, but also for other encodings). Can you (or somebody else) tell me what the case-related encoding primitives are supposed to do? (������������������������������������������������������������������������������������������������������������������������������������������������) ---------------------------------------- Bug #10097: Case-insensitive Regexp matching for Windows-1252 not working for ���������������� https://bugs.ruby-lang.org/issues/10097#change-48188 * Author: Martin D��rst * Status: Open * Priority: Normal * Assignee: * Category: * Target version: * ruby -v: 1.9.3p545 * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN ---------------------------------------- By chance I had a look at enc/iso_8859_1.c and found ~~~C ENC_REPLICATE("Windows-1252", "ISO-8859-1") ~~~ on line 288. But this does not work for case folding: ~~~ruby # http://en.wikipedia.org/wiki/Windows-1252 s1 = "\u0160".encode 'windows-1252' # '��' r1 = Regexp.new("\u0161".encode('windows-1252'), Regexp::IGNORECASE) # /��/i s1 =~ r1 # => nil s2 = "\u0178".encode 'windows-1252' # '��' r2 = Regexp.new("\u00FF".encode('windows-1252'), Regexp::IGNORECASE) # /��/i s2 =~ r2 # => nil s3 = "\u00C0".encode 'windows-1252' # '��' r3 = Regexp.new("\u00E0".encode('windows-1252'), Regexp::IGNORECASE) # /��/i s3 =~ r3 # => 0 ~~~ So case-insensitive matching works when both characters are in iso-8859-1, but not when one (����) or both (������������) characters are not in iso-8859-1. -- https://bugs.ruby-lang.org/