[#121215] [Ruby master Bug#21166] Fiber Scheduler is unable to be interrupted by `IO#close`. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

Issue #21166 has been reported by ioquatix (Samuel Williams).

13 messages 2025/03/02

[#121222] [Ruby master Bug#21167] Visual Studio 2022 17.13.x couldn't build ruby.exe — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

Issue #21167 has been reported by hsbt (Hiroshi SHIBATA).

8 messages 2025/03/03

[#121234] [Ruby master Bug#21168] Prism doesn't require argument parentheses (in some cases) when a block is present but parse.y does — "Earlopain (Earlopain _) via ruby-core" <ruby-core@...>

Issue #21168 has been reported by Earlopain (Earlopain _).

8 messages 2025/03/04

[#121389] [Ruby Bug#21187] Strings concatenated with `\` getting frozen with literal hashes (PRISM only) — LocoDelAssembly via ruby-core <ruby-core@...>

Issue #21187 has been reported by LocoDelAssembly (Hern=E1n Pereira).

12 messages 2025/03/17

[#121413] [Ruby Bug#21193] Inherited callback returns `nil` for `Object.const_source_location` — "eileencodes (Eileen Uchitelle) via ruby-core" <ruby-core@...>

Issue #21193 has been reported by eileencodes (Eileen Uchitelle).

15 messages 2025/03/20

[#121451] [Ruby Bug#21201] Performance regression when defining methods inside `refine` blocks — "alpaca-tc (Hiroyuki Ishii) via ruby-core" <ruby-core@...>

Issue #21201 has been reported by alpaca-tc (Hiroyuki Ishii).

8 messages 2025/03/27

[ruby-core:121316] [Ruby master Bug#20025] Parsing identifiers/constants is case-folding dependent

From: "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>
Date: 2025-03-13 05:26:41 UTC
List: ruby-core #121316
Issue #20025 has been updated by hsbt (Hiroshi SHIBATA).

Backport changed from 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED to 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE

ruby_3_2 commit:6c24731837f88d67517cfc590cb496daed7a0ef5 merged revision(s) commit:79eb75a8dd64848f23e9efc465f06326b5d4b680.

----------------------------------------
Bug #20025: Parsing identifiers/constants is case-folding dependent
https://bugs.ruby-lang.org/issues/20025#change-112281

* Author: kddnewton (Kevin Newton)
* Status: Closed
* Backport: 3.0: REQUIRED, 3.1: REQUIRED, 3.2: DONE
----------------------------------------
When CRuby parses identifiers, it is encoding-dependent. Once the identifier is found, it determines if it starts with a uppercase or lowercase codepoint. This determines if the identifier is a constant or not.

The function is charge of this is `rb_sym_constant_char_p`. For non-unicode encodings where the leading byte has the top-bit set, this relies on onigmo's `mbc_case_fold` to determine if it is a constant or not (as opposed to `is_code_ctype`).

This works for almost every single codepoint in every encoding, but has one very weird edge case. In the Windows-1253 encoding for the 0xB5 byte, it's the micro sign. The micro sign, when case folded, becomes the uppercase mu character, and then the lowercase mu character, or 0xEC. This means that even though 0xB5 reports itself as being a lowercase codepoint, it gets parsed as a constant. This example might make this more clear:

``` ruby
class Context < BasicObject
  def method_missing(name, *) = :identifier
  def self.const_missing(name) = :constant
end

encoding = Encoding::Windows_1253
character = 0xB5.chr(encoding)

source = "# encoding: #{encoding.name}\n#{character}\n"
result = Context.new.instance_eval(source)

puts "#{encoding.name} encoding of 0x#{character.ord.to_s(16).upcase}"
puts "  [[:alpha:]] => #{character.match?(/[[:alpha:]]/)}"
puts "  [[:alnum:]] => #{character.match?(/[[:alnum:]]/)}"
puts "  [[:upper:]] => #{character.match?(/[[:upper:]]/)}"
puts "  [[:lower:]] => #{character.match?(/[[:lower:]]/)}"
puts "  parsed as #{result}"
```

this results in the output of:

```
Windows-1253 encoding of 0xB5
  [[:alpha:]] => true
  [[:alnum:]] => true
  [[:upper:]] => false
  [[:lower:]] => true
  parsed as constant
```

To be clear, I don't think the case-folding is incorrect here (and @duerst confirms that it is correct). I believe instead that it is incorrect to use case-folding here to determine if a codepoint is uppercase or not.

Note that this only impacts this one codepoint in this one encoding, so I don't believe this is actually a large-scale problem. But I found it surprising, and think we should change it.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/


In This Thread

Prev Next