[#49675] Request to update LEGAL file for zlib, UCD license — Jun Aruga <jaruga@...>
こんにちは。有賀と申します。
10 messages
2016/06/20
[#49678] Re: Request to update LEGAL file for zlib, UCD license
— Jun Aruga <jaruga@...>
2016/06/21
ご回答、そして+1のご意見ありがとうございます。
[#49683] Re: Request to update LEGAL file for zlib, UCD license
— Martin J. Dürst <duerst@...>
2016/06/22
On 2016/06/22 00:15, Jun Aruga wrote:
[ruby-dev:49663] [Ruby trunk Bug#11859][Rejected] Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.
From:
naruse@...
Date:
2016-06-13 09:37:58 UTC
List:
ruby-dev #49663
Issue #11859 has been updated by Yui NARUSE.
Status changed from Open to Rejected
Ruby doesn't have case tables for non Unicode encodings.
And EUC-JP is legacy encoding, I don't think such encoding should be extended.
----------------------------------------
Bug #11859: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.
https://bugs.ruby-lang.org/issues/11859#change-59185
* Author: Kimihito Matsui
* Status: Rejected
* Priority: Normal
* Assignee:
* ruby -v: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
U+FF21 (A, FULLWIDTH LATIN CAPITAL LETTER A) and U+00c0 (À, LATIN CAPITAL LETTER A WITH GRAVE) is `Uppercase_Letter` so it should match and return 0 in following case but this returns 1.
~~~
ruby -e 'puts "\uFF21A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP”))' # => 1
ruby -e 'puts "\u00C0A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP"))’ # => 1
~~~
This also happens in lower case matching.
~~~
ruby -e 'puts "\uFF41a".encode("EUC-JP") =~ Regexp.compile("\\\p{Lower}".encode("EUC-JP"))’ #=> 1
~~~
In Unicode encoding it works as follows.
~~~
ruby -e 'puts "\uFF21A" =~ Regexp.compile("\\\p{Upper}")' # => 0
~~~
Looks like EUC-JP `\p{Upper}` and `\p{Lower}` regex is limited to ASCII characters.
--
https://bugs.ruby-lang.org/