[#30589] [Bug #3391] Use single exclamation mark instead of double exclamation mark for IRB — Diego Viola <redmine@...>

Bug #3391: Use single exclamation mark instead of double exclamation mark for IRB

10 messages 2010/06/04

[#30672] [Bug #3411] Time.local 1916,5,1 #=> 1916-04-30 23:00:00 +0100 — Benoit Daloze <redmine@...>

Bug #3411: Time.local 1916,5,1 #=> 1916-04-30 23:00:00 +0100

12 messages 2010/06/08

[#30699] [Bug #3419] 1.9.2-preview3 possible bug with Rails 3 active_record sqlite_adapter — Joe Sak <redmine@...>

Bug #3419: 1.9.2-preview3 possible bug with Rails 3 active_record sqlite_adapter

9 messages 2010/06/09

[#30734] [Bug #3428] ri outputs ansi escape sequences even when stdout is not a tty — caleb clausen <redmine@...>

Bug #3428: ri outputs ansi escape sequences even when stdout is not a tty

11 messages 2010/06/11

[#30756] [Feature #3436] Spawn the timer thread lazily — Maximilian Gass <redmine@...>

Feature #3436: Spawn the timer thread lazily

15 messages 2010/06/13
[#32686] [Ruby 1.9-Feature#3436] Spawn the timer thread lazily — Mark Somerville <redmine@...> 2010/10/04

Issue #3436 has been updated by Mark Somerville.

[ruby-core:30579] [Bug #3386] Inconsistent regexp punct class matching behavior between UTF-8 and ASCII encodings

From: Jeffrey Yeung <redmine@...>
Date: 2010-06-04 00:14:47 UTC
List: ruby-core #30579
Bug #3386: Inconsistent regexp punct class matching behavior between UTF-8 and ASCII encodings
http://redmine.ruby-lang.org/issues/show/3386

Author: Jeffrey Yeung
Status: Open, Priority: Low
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]

Scenario:
---------
Use a Regexp pattern that includes the [:punct:] character class (or the \p{Punct} expression) on strings containing only standard punctuation characters `~!@#$%^&*()_+-=[]\{}|;':",./<>?.

Issue:
------
The match results on UTF-8 encoded strings is unexpectedly different from ASCII encoded strings.

I have observed two issues:
 * The [[:punct:]] expression does not match characters `~$^+=|<> when applied to UTF-8 strings.
 * The \p{^Punct} and the \P{Punct} expressions indicate different results when applied to UTF-8 strings - the latter (\P{Punct}) seems to be incorrect.

To illustrate these, here is a bit of Ruby code:
  teststr = '`~!@#$%^&*()_+-=[]\\{}|;\':",./<>?'
  teststr2 = teststr.encode('UTF-8')
  teststr3 = teststr.encode('ASCII-8BIT')

  def gsub_tests(teststr)
    puts "String (#{teststr.encoding}): \'#{teststr}\'"
    strout1 = teststr.gsub(/[[:punct:]]/, '')
    strout2 = teststr.gsub(/[^[:punct:]]/, '')
    strout3 = teststr.gsub(/\p{Punct}/, '')
    strout4 = teststr.gsub(/\p{^Punct}/, '')
    strout5 = teststr.gsub(/\P{Punct}/, '')
    puts "  Output 1 = \'#{strout1}\'"
    puts "  Output 2 = \'#{strout2}\'"
    puts "  Output 3 = \'#{strout3}\'"
    puts "  Output 4 = \'#{strout4}\'"
    puts "  Output 5 = \'#{strout5}\'"
  end

  gsub_tests(teststr)
  gsub_tests(teststr2)
  gsub_tests(teststr3)


Here is output I observe when running the above code:
$ ruby test.rb
String (US-ASCII): '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 1 = ''
  Output 2 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 3 = ''
  Output 4 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 5 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
String (UTF-8): '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 1 = '`~$^+=|<>'
  Output 2 = '!@#%&*()_-[]\{};':",./?'
  Output 3 = ''
  Output 4 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 5 = '!@#%&*()_-[]\{};':",./?'
String (ASCII-8BIT): '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 1 = ''
  Output 2 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 3 = ''
  Output 4 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'
  Output 5 = '`~!@#$%^&*()_+-=[]\{}|;':",./<>?'


Note test outputs 1, 2, and 5 for the UTF-8 encoded string above.


----------------------------------------
http://redmine.ruby-lang.org

In This Thread

Prev Next